School of Information Sciences

Downie presents TORCHLITE in Germany

Stephen Downie
J. Stephen Downie, Professor, Executive Associate Dean, and Co-Director of the HathiTrust Research Center

This week, Professor and Executive Associate Dean J. Stephen Downie was a guest speaker at the Herder Institute in Marburg and the University of Göttingen. Downie, who serves as co-director of the HathiTrust Research Center (HTRC), lectured on the HTRC's "Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction" (TORCHLITE) project.

The HTRC facilitates nonprofit and educational uses of the HathiTrust Digital Library (HTDL) by enabling computational analysis of the library's 19 million volumes, of which around 10 million are under copyright restrictions. Funded by the National Endowment for Humanities from 2022 through 2024, TORCHLITE created easy-to-use text analysis tools, dashboards, and application programming interfaces—all of which remain active and available—to facilitate open cultural analytics research using the uniquely valuable HTDL data. 

The data of interest is contained in HTRC's flagship "Extracted Features" (EF) dataset, which consists of rich metadata and statistical information inferred by algorithm from the digitized texts of the entire HathiTrust corpus and documents every word on every page, including the number of times the word appears, its part of speech, and other formal features of the language on the page. The EF dataset, and methods for computing over it, have enabled many forms of full-text analysis—even of copyrighted materials. The EF dataset contains nearly 3 trillion tokens (or in other words, words) representing more than 6 billion pages of text, making it arguably the largest open dataset of its kind that is readily available to researchers around the world.

In his talk, Downie highlighted the motivations, challenges, and accomplishments of TORCHLITE to date, along with its upcoming next steps that envision the creation of an international consortium of similar groups, tentatively called the "Cultural Open Data Exchange (CODEx)," which will promote and extend HTRC's EF model and methods, enabling other cultural heritage institutions provide access to their otherwise closed collections. 

Downie conducts work in digital libraries, digital humanities, and music information retrieval. He holds a bachelor's degree in music theory and composition, along with master's and doctoral degrees in library and information science, all from the University of Western Ontario. 

Updated on
Backto the news archive

Related News

Bruce explores democratic education in new book

Professor Emeritus Chip Bruce has authored a new book exploring the relationship between education and democracy. Democratic Education: Finding Hope in Challenging Times was recently published by Peter Lang. 

Chip Bruce

Undergraduate Research Symposium features iSchool researchers

The iSchool is well represented in the 19th annual Undergraduate Research Symposium, which will be held on April 30 from 9:00 a.m.-5:00 p.m. in the Illini Union. The iSchool is a Gold Sponsor of the symposium, which spotlights undergraduate research through oral and poster presentations, creative performances, and art exhibits.

Stier selected for I Love My Librarian Award

Adjunct Lecturer Zachary Stier has been selected for a 2026 I Love My Librarian Award. Honorees were recognized for their outstanding public service accomplishments. 

Zachary Stier

iSchool researchers to present at CHI 2026

iSchool faculty and students will present their research at the ACM Conference on Human Factors in Computing Systems (CHI 2026), which will be held from April 13–17 in Barcelona, Spain. The conference, considered the most prestigious in the field of Human-Computer Interaction, attracts researchers and practitioners from around the globe.

Wang and Snap Research partner on "Profile Agent"

Imagine your favorite apps had a "digital twin" of your personality that actually grew up with you. Right now, most AI systems create a static snapshot of your interests. For example, a personal shopper who keeps recommending video games just because you bought one three years ago, even though you've long since moved on to hiking and cooking. To bridge this gap, Professor Dong Wang's team at the University of Illinois Urbana-Champaign is partnering with Snap Research to build a "Profile Agent."

Dong Wang

School of Information Sciences

501 E. Daniel St.

MC-493

Champaign, IL

61820-6211

Voice: (217) 333-3280

Email: ischool@illinois.edu

Back to top