The Distance Machine
The goal of this project is to help scholars and students better understand the linguistic contexts surrounding nineteenth-century texts in the English language. Its central component is a Web application called the Distance Machine, which provides a way of relating a particular text to quantitative data about broad changes in the language. The ultimate goal of this project is to help researchers in nineteenth-century literature understand the political, intellectual, and cultural implications that particular words had in the past and that have since been lost as the words came into more general use.
The Distance Machine was inspired by the observation that many of the texts contain words that would have stood out as unusual to a reader in the nineteenth century but that have since become standard. Using Google’s Ngrams data and a specially-designed statistical model, it determines the approximate year when certain words—such as thou—cease to be common, and when others—such as computer—come into frequent use. Using this information, it shows you (right in your Web browser) how the words of a particular text fit in with these overall trends, indicating both archaic words and words that, while now common, are unusual in books from a particular point in the past.
Although this tool can lead to some surprising insights into nineteenth-century texts, interpreting the results requires an understanding of the ways in which the meanings, connotations, and registers of words have changed over time. A sense of context is especially important with regard to nineteenth-century America, when local dialect differences, “Americanisms,” and distinctions between refined and vulgar speech were subjects of heated debate. To help understand these issues, the tool can also display entries from several dictionaries from the nineteenth century to the present and highlight words in a text that are marked as “vulgar” or “improper” in particular dictionaries.