Underwood receives NEH grant to investigate consequences of error in digital libraries

Ted Underwood
Ted Underwood, Professor

Professor Ted Underwood has received a $73,122 grant from the National Endowment for the Humanities to investigate the consequences of error in digital libraries. While digital libraries represent an immense storehouse of knowledge, the texts are full of errors because of the imperfect process by which they are transcribed optically.

"It isn't unusual for five percent of the words in volumes to be mistranscribed, with the level of error much higher in some volumes," said Underwood. "Simply measuring the fraction of mistranscribed words is easy. It’s harder to know how much difference those errors make for the methods and questions that actually interest researchers. Some forms of analysis are undisturbed by high levels of error; others may be quite sensitive, especially when errors are distributed unevenly across different historical periods and genres."

Underwood will work with graduate students from the iSchool and English Department to construct parallel collections that pair each "clean" text with a realistically error-ridden version of the same book drawn from a digital library. The team will build collections of Chinese texts as well as English texts ranging from 1700 to the present, because different character sets and printing technologies produce different kinds of error. Then the team will apply a wide range of data-mining methods to both the clean and error-ridden collections and measure the distortion produced by transcription error and other common sources of noise. The project will provide tools that help other researchers estimate the level of uncertainty in their own conclusions.

"No data is perfect. There's always some kind of error. The question is whether the error is of a kind and magnitude likely to matter for a particular question," he said.

Underwood is a professor in the iSchool and also holds an appointment with the Department of English in the College of Liberal Arts and Sciences. He has authored three books about literary history, including Distant Horizons (The University of Chicago Press Books, 2019), Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (Stanford University Press, 2013), and The Work of the Sun: Literature, Science and Political Economy 1760-1860 (New York: Palgrave, 2005). His articles have appeared in PMLA, Representations, MLQ, and Cultural Analytics. Underwood earned his PhD in English from Cornell University.

Updated on
Backto the news archive

Related News

Huang and students present at CSCW 2021

Assistant Professor Yun Huang and students will present their research at the 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2021), which will be held virtually on October 23-27. CSCW is the premier venue for experts from industry and academia to explore the technical, social, material, and theoretical challenges of designing technology to support collaborative work and life activities.

Yun Huang

Spectrum Scholar Spotlight: Ashley Bolger

Eight iSchool master's students were named 2021-2022 Spectrum Scholars by the American Library Association (ALA) Office for Diversity, Literacy, and Outreach Services. This "Spectrum Scholar Spotlight" series highlights the School’s scholars. MS/LIS student Ashley Bolger earned her BS degree in environmental studies from the University of Vermont, with a concentration in environmental justice, policy, and education.

Ashley Bolger

Cordell to deliver keynote on Viral Texts project

Associate Professor Ryan Cordell will deliver the keynote address at the Marbach-Weimar-Wolfenbüttel (MWW) Research Association Mid-Term Conference, which will be held virtually from Germany on October 14-15. The goal of the MWW is "to provide future-oriented impulses for collaboration in the field of humanities and cultural studies research." The association's mid-term conference will focus on engagement with material and medial losses in the archive and library.

Ryan Cordell

New journal article examines vaccination misinformation on social media

Research conducted by Assistant Professor Jessie Chin's Adaptive Cognition and Interaction Design Lab (ACTION) provided the foundation for an article recently published in the high-impact Journal of Medical Internet Research. PhD student Tre Tomaszewski is the first author on the peer-reviewed article, "Identifying False Human Papillomavirus (HPV) Vaccine Information and Corresponding Risk Perceptions from Twitter: Advanced Predictive Models."

Tre Tomaszewski

Franks named 2021 ALA Century Scholar

MS/LIS student Mary Franks has been named the 2021 recipient of the American Library Association (ALA) Century Scholarship. The scholarship supports students with disabilities, providing funds for services or accommodations that will enable them to successfully complete their MS or PhD in an ALA-accredited library and information science program.

Mary Franks