Distant reading

Distant reading is an approach in literary studies that applies computational methods to literary data, usually derived from large digital libraries, for the purposes of literary history and theory.

While the term is collective, and is used to refer to a range of different computational methods of analysing literary data, similar approaches also include macroanalysis, cultural analytics, computational formalism, computational literary studies, quantitative literary studies, and algorithmic literary criticism.


The term "distant reading" is generally attributed to Franco Moretti and his 2000 article, Conjectures on World Literature.[1] In the article, Moretti proposed a mode of reading which included works outside of established literary canons, which he variously termed "the great unread"[2] and, elsewhere, "the Slaughterhouse of Literature".[3] The innovation it proposed, as far as literary studies was concerned, was that the method employed samples, statistics, paratexts, and other features not often considered within the ambit of literary analysis. Moretti also established a direct opposition to the theory and methods of close reading: "One thing for sure: it cannot mean the very close reading of very few texts—secularized theology, really ('canon'!)—that has radiated from the cheerful town of New Haven over the whole field of literary studies".[4]

However, Moretti initially conceived distant reading for analysis of secondary literature as a roundabout way of getting to know more about primary literature: "[literary history] will become 'second-hand': a patchwork of other people's research, without a single direct textual reading".[2] Only later did the term distant reading (via Moretti and other scholars) come to become primarily identified with computational analysis of primary literary sources.

Despite the consensus about the origins of distant reading at the turn of the twenty-first century, Ted Underwood has traced a longer genealogy of the method, arguing for its elision in current discourse about distant reading. He writes that "distant reading has a largely distinct genealogy stretching back many decades before the advent of the internet—a genealogy that is not for the most part centrally concerned with computers".[5] Underwood emphasises a social-scientific dimension in this prehistory of distant reading, referring to particular examples in the work of Raymond Williams (from the 1960s) and Janice Radway (from the 1980s). Moretti’s conception of literary evolution in Distant Reading is quite similar to the psychologist Colin Martindale’s (Clockwork Muse, 1990) "scientific", computational, neo-Darwinist project of literary evolution, and the role of reading is downplayed by both Martindale and Moretti. According to Martindale, the principles of the evolution of art are based on statistic regularities rather than meaning, data or observation. "So far as the engines of history are concerned, meaning does not matter. In principle, one could study the history of a literary tradition without reading any of literature. ... the main virtue of the computerized content analysis methods I use is that they save one from actually having to read the literature" (p. 14).

This variety in the stated definitions and aims of distant reading is characteristic of its development since the turn of the twenty-first century, where it has come to encompass a variety of different methods and approaches, rather than representing a single or unified method of literary study.

Principles and practice

One of the central principles of distant reading is that literary history and literary criticism can be written without necessarily resorting to the kind of careful, sustained reading encounter with individual texts that is fundamental to close reading.

Commonly, distant reading is performed at scale, using a large collection of texts. However, some scholars have adopted the principles of distant reading in the analysis of a small number of texts or an individual text.[6] Distant reading often shares with the Annales school a focus on the analysis of long-term histories and trends. Empirical approaches to literary study are a regular characteristic of distant reading, and are often accompanied by a reliance on quantitative methods. Moretti has described the concept of 'operationalizing' as "absolutely central to the new field of computational criticism"[7] that includes distant reading. This principle, for Moretti, consists of "building a bridge from concepts to measurement, and then to the world" (104), underscoring the combined interests of empirical and quantitative study at its heart. In practice, distant reading has been undertaken with the aid of computers in the twenty-first century (though Underwood has argued for prominent non-computational precursors[8]); however, some works combining scale and literary study have been described as "distant-reading-by-hand".[9]

Criticisms of distant reading

Stanley Fish takes a broad view of what he frames as problems of interpretation in the digital humanities, but the specific example he isolates for critique is informed by his impression of distant reading methodology: "first you run the numbers, and then you see if they prompt an interpretive hypothesis. The method, if it can be called that, is dictated by the capability of the tool".[10] In a similar vein, Stephen Marche focuses on the prospects for interpretation within the framework of computational literary analysis in an article which begins with the provocation, "[b]ig data is coming for your books".[11] Though he initially described distant reading as the "most promising path, at least on the surface"[11] of a range of Digital Humanities methods he surveys, he concludes that the generalisations he perceives in the method are ineffective when "applied to literary questions proper".[11] Additional critiques of distant reading have come from postcolonial theorists. Gayatri Spivak is unconvinced about distant reading's claims to represent the perspectives of the "great unread", asking "[s]hould our only ambition be to create authoritative totalizing patterns depending on untested statements by small groups of people treated as native informants?".[12] Jonathan Arac questions the "unavowed imperialism of English"[13] in Moretti's work.


In "Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740–1850)"[14] Franco Moretti uses an early distant reading methodology to analyse certain changes in the titles of novels in the given period and country. In the absence of dedicated corpora of these novels' texts, Moretti argues that "titles are still the best way to go beyond the 1 percent of novels that make up the canon, and catch a glimpse of the literary field as a whole".[14] In the article, Moretti combines the results of quantitative analysis of these titles with contextual knowledge of literary history to address questions about the shortening of eighteenth-century novel titles, about the nature of very short novel titles, and about the relationship of novel titles to genres. For examples, in Section I, he provides evidence of the decreasing length of titles across the time span, and links the phenomenon to the growth of the market for novels and the establishment of periodicals which regularly reviewed novels.

In 'Why Literary Time is Measured in Minutes"[15] Ted Underwood asks " Why are short spans of time so central to our discipline? ... Why is experience measured in seconds or minutes more appropriately literary than experience measured in weeks or months?".[16] Methodologically, Underwood supplements theoretical ideas about the compression of fictional time with approaches from distant reading which model the average lengths of time described in 250-word portions of fiction across three centuries. Having also combined quantitative findings with close reading, Underwood concludes his article with a discussion of the integration of quantitative methods into literary study, with the author suggesting that "I see close readings and statistical models not as competing epistemologies but as interlocking modes of interpretation that excel at different scales of analysis".[17]

In their Literary Lab pamphlet, "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method",[18] Ryan Heuser and Long Le-Khac analyse word usage within their corpus to argue for a "systemic concretization of language and fundamental change in the social spaces of the novel".[19] Their analysis demonstrates a change in the way in which concrete detail is presented across the span of the nineteenth century, with an observable shift in the novel's narrative style "from telling to showing"[20] as the century develops. The findings tally with many literary-critical writings about the change in nineteenth-century narrative style from realism to modernism.

Lauren F. Klein trains methods from computational linguistics and data visualisation on an archive of slavery, in her article, "The Image of Absence: Archival Silence, Data Visualization, and James Hemings",[21] in order to present examples of how distant reading can uncover and illuminate "the silences endemic to the archive of American slavery".[22] Searching for archival traces of James Hemings, Thomas Jefferson's enslaved chef, Klein juxtaposes visualisations of his presence with Jefferson's own charts and tables as the basis for a discussion of data visualisation as it relates to the construction of race.

The COST Action 'Distant Reading for European Literary History'[23] is a European networking project bringing together scholars interested in corpus building, quantitative text analysis, and European literary history. It aims to create a network of researchers jointly developing the distant reading resources and methods necessary to change the way European literary history is written. The objectives of the project include coordinating the creation of a multilingual European Literary Text Collection (ELTeC)[24] containing digital full-texts of novels in different European languages.

See also


  1. ^ Moretti, Franco (2000). "Conjectures on World Literature". New Left Review. 1.
  2. ^ a b Moretti, Franco (2000). "Conjectures on World Literature". New Left Review. 1: 55.
  3. ^ Moretti, Franco (2000). "The Slaughterhouse of Literature". Modern Language Quarterly. 61 (1): 207. doi:10.1215/00267929-61-1-207. S2CID 161329715.
  4. ^ Moretti, Franco (2000). "The Slaughterhouse of Literature". Modern Language Quarterly. 61 (1): 208. doi:10.1215/00267929-61-1-207. S2CID 161329715.
  5. ^ Underwood, Ted (2017). "A Genealogy of Distant Reading". Digital Humanities Quarterly. 11 (2).
  6. ^ Eve, Martin Paul (2017). "Close Reading with Computers: Genre Signals, Parts of Speech, and David Mitchell's Cloud Atlas". SubStance. 46 (3). doi:10.3368/ss.46.3.76. S2CID 54614638.
  7. ^ Moretti, Franco (2013). "'Operationalizing': Or, the Function of Measurement in Literary Theory". New Left Review. 84: 103.
  8. ^ Underwood, Ted (2017). "A Genealogy of Distant Reading". Digital Humanities Quarterly. 11 (2).
  9. ^ Pasanek, Brad (2015). Metaphors of Mind: An Eighteenth-Century Dictionary. Baltimore: Johns Hopkins University Press. ISBN 9781421416885.
  10. ^ Fish, Stanley (23 Jan 2012). "Mind Your P's and B's: The Digital Humanities and Interpretation". New York Times.
  11. ^ a b c Marche, Stephen (28 Oct 2012). "Literature Is not Data: Against Digital Humanities". Los Angeles Review of Books.
  12. ^ Spivak, Gayatri Chakravorty (2005). Death of a Discipline. Columbia University Press. pp. 107–8. ISBN 9780231129459.
  13. ^ Arac, Jonathan (2002). "Anglo-Globalism?". New Left Review. 16: 44.
  14. ^ a b Moretti, Franco (2009). "Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740–1850)". Critical Inquiry. 36 (1): 134–158. doi:10.1086/605619. JSTOR 10.1086/606125.
  15. ^ Underwood, Ted (2018). "Why Literary Time is Measured in Minutes". ELH. 85 (2): 341–365. doi:10.1353/elh.2018.0013. hdl:2142/100076. S2CID 192215143.
  16. ^ Underwood, Ted (2018). "Why Literary Time is Measured in Minutes". ELH. 85 (2): 342. doi:10.1353/elh.2018.0013. hdl:2142/100076. S2CID 192215143.
  17. ^ Underwood, Ted (2018). "Why Literary Time is Measured in Minutes". ELH. 85 (2): 363. doi:10.1353/elh.2018.0013. hdl:2142/100076. S2CID 192215143.
  18. ^ Heuser, Ryan; Le-Khac, Long (2012). "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method" (PDF). Pamphlets of the Stanford Literary Lab. 4.
  19. ^ Heuser, Ryan; Le-Khac, Long (2012). "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method" (PDF). Pamphlets of the Stanford Literary Lab. 4: 2.
  20. ^ Heuser, Ryan; Le-Khac, Long (2012). "A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method" (PDF). Pamphlets of the Stanford Literary Lab. 4: 45.
  21. ^ Klein, Lauren F. (2013). "The Image of Absence: Archival Silence, Data Visualization, and James Hemings". American Literature. 85 (4): 661–688. doi:10.1215/00029831-2367310.
  22. ^ Klein, Lauren F. (2013). "The Image of Absence: Archival Silence, Data Visualization, and James Hemings". American Literature. 85 (4): 661. doi:10.1215/00029831-2367310.
  23. ^ "Distant Reading for European Literary History". Distant Reading.
  24. ^ "ELTeC: European Literary Text Collection". Distant Reading.
  • v
  • t
  • e
General termsText analysis
Text segmentation
Automatic summarizationMachine translationDistributional semantics modelsLanguage resources,
datasets and corpora
Types and
Automatic identification
and data captureTopic modelComputer-assisted
reviewingNatural language
user interfaceRelated