Text Mining the Novel: Establishing the Foundations of a New Discipline

Time Frame


Total Funding to Date



  • Ted Underwood
  • J. Stephen Downie

This HathiTrust Research Center (HTRC) project seeks to produce the first large-scale cross-cultural study of the novel according to quantitative methods. Ever since its putative rise in the eighteenth century, the novel has emerged as a central means of expressing what it means to be modern. And yet despite this cultural significance, we still lack a comprehensive study of the novel’s place within society that accounts for the vast quantity of novels produced since the eighteenth century, the period most often identified as marking the origins of the novel’s quantitative rise. Our aim is thus twofold: 1) to enliven our understanding of one of the most culturally significant modern art forms according to new computational means, and 2) to establish the methodological foundations of a new disciplinary formation. Text mining is arguably one of the most important fields driving growth, innovation, and even citizenship within a modern information economy. This partnership seeks to bring the unique knowledge of literary studies to bear on larger debates about text mining and the place of information technology within society. In so doing, it will impact how we think about the nature of reading and the way we increasingly access our cultural heritage today.
The research team includes ten literary historians from leading digital humanities institutes across North America; seven collaborators drawn from nonliterary disciplines who are leaders of major text mining initiatives in their respective fields; and four representatives from some of the most significant digital content initiatives today, including the HathiTrust Research Center, which represents the largest collection of digitized literary material in the world; Compute Canada, the computational backbone for all data-driven research in Canada; the commercial initiative Gale Cengage Learning, a world leader in digital publishing; and the metaLab at Harvard University, a leader in developing new publishing platforms and information design.

Learn more at http://novel-tm.ca/.

Funding Agencies

  • Social Sciences and Humanities Research Council of Canada, 2014 – $142,000.00