Underwood receives NEH grant to investigate consequences of error in digital libraries

Ted Underwood
Ted Underwood, Professor and Associate Dean for Academic Affairs

Professor Ted Underwood has received a $73,122 grant from the National Endowment for the Humanities to investigate the consequences of error in digital libraries. While digital libraries represent an immense storehouse of knowledge, the texts are full of errors because of the imperfect process by which they are transcribed optically.

"It isn't unusual for five percent of the words in volumes to be mistranscribed, with the level of error much higher in some volumes," said Underwood. "Simply measuring the fraction of mistranscribed words is easy. It’s harder to know how much difference those errors make for the methods and questions that actually interest researchers. Some forms of analysis are undisturbed by high levels of error; others may be quite sensitive, especially when errors are distributed unevenly across different historical periods and genres."

Underwood will work with graduate students from the iSchool and English Department to construct parallel collections that pair each "clean" text with a realistically error-ridden version of the same book drawn from a digital library. The team will build collections of Chinese texts as well as English texts ranging from 1700 to the present, because different character sets and printing technologies produce different kinds of error. Then the team will apply a wide range of data-mining methods to both the clean and error-ridden collections and measure the distortion produced by transcription error and other common sources of noise. The project will provide tools that help other researchers estimate the level of uncertainty in their own conclusions.

"No data is perfect. There's always some kind of error. The question is whether the error is of a kind and magnitude likely to matter for a particular question," he said.

Underwood is a professor in the iSchool and also holds an appointment with the Department of English in the College of Liberal Arts and Sciences. He has authored three books about literary history, including Distant Horizons (The University of Chicago Press Books, 2019), Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (Stanford University Press, 2013), and The Work of the Sun: Literature, Science and Political Economy 1760-1860 (New York: Palgrave, 2005). His articles have appeared in PMLA, Representations, MLQ, and Cultural Analytics. Underwood earned his PhD in English from Cornell University.

Updated on
Backto the news archive

Related News

Cheng defends dissertation

Doctoral candidate Jessica Cheng successfully defended her dissertation, "Agreeing to Disagree: Applying a Logic-based Approach to Reconciling and Merging Multiple Taxonomies," on May 25. 

Jessica Cheng

Student award recipients announced

Each year, the School of Information Sciences recognizes a group of outstanding students for their achievement in academics as well as a number of attributes that contribute to professional success. Congratulations to this year's honorees!

Alma with cap

Brooks presents keynote at West African conference

Ian Brooks, iSchool research scientist and director of the Center for Health Informatics (CHI), gave a keynote talk at the West Africa Conference on Digital Public Goods and Cybersecurity, which was held on May 9-10 in Freetown, Sierra Leone. The conference focused on bridging the gender gap in digital public goods and cybersecurity spaces in Africa.

Ian Brooks

New project to help identify and predict insider threats

Insider threats are one of the top security concerns facing large organizations. Current and former employees, business partners, contractors—anyone with the right level of access to a company’s data—can pose a threat. The incidence of insider threats has increased in recent years, at a significant cost to companies. Associate Professor Jingrui He is addressing this problem in a new project that seeks to detect and predict insider threats. She has been awarded a three-year, $200,000 grant from the C3.ai Digital Transformation Institute for her project, "Multi-Facet Rare Event Modeling of Adaptive Insider Threats."

Jingrui He

iSchool students present their research at Urbana City Council meeting

At the Urbana City Council meeting on May 9, students in the Community Data (IS 594) course presented their research on how communities are reducing gun violence. According to their instructor Chamee Yang, postdoctoral research associate with the iSchool, Community Data Clinic, and Just Infrastructures Initiative, the new course was designed as an experiential learning opportunity with a community engagement component, where students could gain research experience with real-world implications. Throughout the Spring 2022 semester, students worked in groups to explore community-driven approaches to prevent gun violence.

Chamee Yang, Sarah Unruh, and Gowri Balasubramaniam