Looking It Up: Dictionary Data in a Digital Age

March 12, 2013 Lisa Peet

You never see a discussion of online dictionaries without someone invoking the magical powers of browsing. It’s true, of course—who hasn’t discovered a really good word while looking for something else? You can’t argue with the expediency of the electronic search, but it does seem a shame to sacrifice that potential in the process. According to an article in the Chronicle of Higher Education on dictionaries in the digital era, though, that serendipity might not end up completely lost in the next generation of online reference. A number of dictionaries and encyclopedias already offer a list of words in the alphabetical vicinity of the search term that can be scrolled through; whether that mimics the kind of eyetracks over a whole page needed to stimulate fortuitous word acquisition remains to be seen. I have my doubts. But the new OED has also integrated its historical thesaurus, which might lead to an entirely new world of accidental finds. Ben Zimmer, proprietor of the Visual Thesaurus, calls it a “kind of blossoming map of words and meaning…. Dictionaries are not just static entities anymore.”

And the goods go in both directions. Lexicographers can see what we’re searching for, and that data makes itself useful in all sorts of ways. Dictionaries’ editors are able to identify trending words, patterns, and usage that might not ordinarily show up on their radar, the better to refine future editions:

Unsuccessful word “look-ups,” or searches that don’t produce satisfying results, can point lexicographers to terms that haven’t yet made their way into a particular dictionary or whose definitions need to be amended or freshened. Online readers can click a button and contribute their own word lore, extending a tradition that dates back at least as far as the late 19th century, when James Murray and his team compiled the first Oxford English Dictionary with the help of thousands of word slips sent in by the public.

Merriam-Webster, at least, has no plans to make its information public. That may be good news for everyone playing Words With Friends at the office, but less so for linguists and humanists who might benefit from such an extensive real-time corpus. Editor Peter Sokolowski says no researchers have asked—seriously?—and offers, maybe more realistically, that it’s “valuable proprietary data, and we do not make it freely available to the public.” But hope springs eternal, and he also adds that “it is conceivable that under the right circumstances we might try to find ways to work with qualified researchers and scholars.” Given the utility of open data mining linguistic tools such as Google’s Ngram Viewer or the Corpus of Contemporary American English, the folks at Merriam-Webster or the OED, have the potential to open up some real riches to scholars and curious amateurs alike. That would be a reasonable tradeoff, maybe, for giving up the chance to discover lacustrine when all you were doing was trying to figure out how to spell lackluster.

(Image is of Brian Dettmer’s altered book, New International Dictionary, 2003.)