Something I Am Embarrassed to Say I Just Learned

By Mark Ward

September 22, 2016

Share this article

I knew that English Bible translators have access to a computerized linguistic corpus—an unbelievably massive collection of English texts—to help them do their work.

What I didn’t know, what I just learned, is that I do, too.

What you’re about to learn, if you didn’t already know it, is so do you.

So I chose to get involved in some online discussion about the KJV, and I’m glad I did. I was talking to some intelligent guys who kept me on my toes. I pointed out to them, in a broader argument about the readability of the KJV, that “dropsy” (Luke 14:2) is an archaic word liable to cause today’s readers to draw a blank. The word is very old, first citation 1290—though, of course, that doesn’t necessarily mean it’s archaic (sack is also very old, but not archaic). But my sense was that “dropsy” just doesn’t get used today.

One of my interlocutors pointed out, and touché for him, that my beloved ESV uses the word, too, however! (He could have added that the NASB uses it as well.) I had not realized this, and I was initially surprised.

However, being a denizen of the Internet and therefore rarely being one to admit fault, I determined to do some poking around. Standard contemporary dictionaries weren’t enough help. Merriam-Webster told me only that “dropsy” means “edema.” American Heritage said the word is “no longer in scientific use,” but didn’t elaborate. Is it archaic? Should the ESV and NASB have used it? I didn’t know yet. Even if the word has dropsied right out of science, maybe it has landed in the speech of the common man.

So I checked Google’s NGram Viewer, and this is what I found:

Right after 1900, “edema” clearly changes places with “dropsy.” I’m not sure why there are massive spikes, and a big drop in the “edema” line starting sometime before the year 2000. I’m also not sure how much to trust Google NGram Viewer—I simply don’t know whether the corpus it’s searching (Google Books) is a truly representative sample. I’m not confident that I’m interpreting the graphs correctly. Perhaps the relative difference is huge, but the actual difference is not. Stats are tricky.

Then it hit me: I wonder if there’s an online English corpus available freely, designed for precisely my question, and focused on contemporary English—the kind of corpus I’ve heard Doug Moo talk about, which he used for the NIV. I searched for “english corpus,” and as they say in Telugu, voilà. I discovered BYU’s Corpus of Contemporary American English (COCA). It provides a massive, curated database balanced of different types of American speech and writing. It’s composed of roughly equal parts spoken, fiction, magazine, newspaper, and academic English. Wow.

There are actually multiple English corpora at the site, and they “allow research on variation—historical, between dialects, and between genres—in ways that are not possible with other corpora.”

So, COCA, what’s a more common word in contemporary English, “dropsy” or “edema”? There’s a very clear winner. But if I give you a fish you’ll only eat for today. Go see if you can figure it out yourself.