School of Information Sciences

Downie presents TORCHLITE in Germany

Stephen Downie
J. Stephen Downie, Professor, Executive Associate Dean, and Co-Director of the HathiTrust Research Center

This week, Professor and Executive Associate Dean J. Stephen Downie was a guest speaker at the Herder Institute in Marburg and the University of Göttingen. Downie, who serves as co-director of the HathiTrust Research Center (HTRC), lectured on the HTRC's "Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction" (TORCHLITE) project.

The HTRC facilitates nonprofit and educational uses of the HathiTrust Digital Library (HTDL) by enabling computational analysis of the library's 19 million volumes, of which around 10 million are under copyright restrictions. Funded by the National Endowment for Humanities from 2022 through 2024, TORCHLITE created easy-to-use text analysis tools, dashboards, and application programming interfaces—all of which remain active and available—to facilitate open cultural analytics research using the uniquely valuable HTDL data. 

The data of interest is contained in HTRC's flagship "Extracted Features" (EF) dataset, which consists of rich metadata and statistical information inferred by algorithm from the digitized texts of the entire HathiTrust corpus and documents every word on every page, including the number of times the word appears, its part of speech, and other formal features of the language on the page. The EF dataset, and methods for computing over it, have enabled many forms of full-text analysis—even of copyrighted materials. The EF dataset contains nearly 3 trillion tokens (or in other words, words) representing more than 6 billion pages of text, making it arguably the largest open dataset of its kind that is readily available to researchers around the world.

In his talk, Downie highlighted the motivations, challenges, and accomplishments of TORCHLITE to date, along with its upcoming next steps that envision the creation of an international consortium of similar groups, tentatively called the "Cultural Open Data Exchange (CODEx)," which will promote and extend HTRC's EF model and methods, enabling other cultural heritage institutions to provide access to their otherwise closed collections. 

Downie conducts work in digital libraries, digital humanities, and music information retrieval. He holds a bachelor's degree in music theory and composition, along with master's and doctoral degrees in library and information science, all from the University of Western Ontario. 

Updated on
Backto the news archive

Related News

Raji invited to join UN Working Expert Group

PhD student Mubarak Raji has been invited to join the Working Expert Group on AI Governance Interoperability. This group operates under the United Nations Office for Digital and Emerging Technologies' new AI Governance for Humanity Lab. It supports the Secretary-General's High-level Advisory Body on AI by providing evidence-based analysis for the Global Dialogue on AI Governance, which will be held in July 2026 in Geneva, Switzerland.

Mubarak Raji headshot

Faculty and staff recognized with inaugural iSchool awards

The iSchool recognized faculty and staff for their contributions to teaching and outstanding service to the School at a ceremony on May 6. Interim Dean Emily Knox presented plaques to the inaugural recipients of the Faculty Teaching Award, Adjunct Teaching Award, and Staff Excellence Award.

Paper by He's lab recognized at ICLR 2026 workshop

The iDEA-iSAIL Joint Laboratory at the University of Illinois received an Outstanding Paper Award at the International Conference on Learning Representations (ICLR) 2026 Logical Reasoning of Large Language Models Workshop for their paper, "RAG Over Tables: Hierarchical Memory Index, Multi-State Retrieval, and Benchmarking." Paper authors include lab members Jingrui He, professor and MSIM program director; Sirui Chen, Xinrui He, and Zihao Li, computer science PhD students; Jiaru Zou, computer science MS student; Dongqi Fu, alum; as well as Jiawei Han, professor of computer science, and Yada Zhu, IBM collaborator. Chen gave an oral presentation of the research at the workshop, which was held last month in Rio de Janeiro, Brazil. This award was selected out of 206 accepted papers at the workshop.

Jingrui He

iSchool to shape development of cultural heritage documentation standards

The School of Information Sciences at the University of Illinois Urbana-Champaign has formally joined the special interest group (SIG) that leads the development of the CIDOC Conceptual Reference Model (CRM), an ISO standard (21127:2023) for the exchange and integration of wide-ranging scientific and scholarly documentation about the past. 

Nicola Carboni

School of Information Sciences

501 E. Daniel St.

MC-493

Champaign, IL

61820-6211

Voice: (217) 333-3280

Email: ischool@illinois.edu

Back to top