School of Information Sciences

Downie presents TORCHLITE in Germany

Stephen Downie
J. Stephen Downie, Professor, Executive Associate Dean, and Co-Director of the HathiTrust Research Center

This week, Professor and Executive Associate Dean J. Stephen Downie was a guest speaker at the Herder Institute in Marburg and the University of Göttingen. Downie, who serves as co-director of the HathiTrust Research Center (HTRC), lectured on the HTRC's "Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction" (TORCHLITE) project.

The HTRC facilitates nonprofit and educational uses of the HathiTrust Digital Library (HTDL) by enabling computational analysis of the library's 19 million volumes, of which around 10 million are under copyright restrictions. Funded by the National Endowment for Humanities from 2022 through 2024, TORCHLITE created easy-to-use text analysis tools, dashboards, and application programming interfaces—all of which remain active and available—to facilitate open cultural analytics research using the uniquely valuable HTDL data. 

The data of interest is contained in HTRC's flagship "Extracted Features" (EF) dataset, which consists of rich metadata and statistical information inferred by algorithm from the digitized texts of the entire HathiTrust corpus and documents every word on every page, including the number of times the word appears, its part of speech, and other formal features of the language on the page. The EF dataset, and methods for computing over it, have enabled many forms of full-text analysis—even of copyrighted materials. The EF dataset contains nearly 3 trillion tokens (or in other words, words) representing more than 6 billion pages of text, making it arguably the largest open dataset of its kind that is readily available to researchers around the world.

In his talk, Downie highlighted the motivations, challenges, and accomplishments of TORCHLITE to date, along with its upcoming next steps that envision the creation of an international consortium of similar groups, tentatively called the "Cultural Open Data Exchange (CODEx)," which will promote and extend HTRC's EF model and methods, enabling other cultural heritage institutions to provide access to their otherwise closed collections. 

Downie conducts work in digital libraries, digital humanities, and music information retrieval. He holds a bachelor's degree in music theory and composition, along with master's and doctoral degrees in library and information science, all from the University of Western Ontario. 

Updated on
Backto the news archive

Related News

Wang group receives ICWSM Best Dataset Paper Award

A paper from Professor Dong Wang's Social Sensing & Intelligence Lab received the Best Dataset Paper Award at the International AAAI Conference on Web and Social Media (ICWSM) held in May 2026 in Los Angeles, California. According to Wang, the paper was accepted in the first review round, which had an acceptance rate of 4.7 percent (14 of 298 submissions). 

Adler and Wang to present at RESPECT 2026

Associate Professor Rachel Adler and Informatics PhD student Olive Wang will present their work at the Association for Computing Machinery Special Interest Group on Computer Science Education Conference on Research on Equity and Sustained Participation in Engineering, Computing, and Technology (RESPECT), which will be held in Chicago this week.

Bashir group presents work at PEPR 2026

PhD students Ramazan Yener, Eryue Xu, and Mubarak Raji presented their research this week at the 2026 USENIX Conference on Privacy Engineering Practice and Respect (PEPR) in Santa Clara, California. PEPR is focused on designing and building products and systems with privacy and respect for their users and the societies in which they operate. The students received USENIX grants covering their conference registration and providing travel support to attend the conference. 

Bashir group PEPR 2026

iSchool researchers to present work at CVPR Conference

Assistant Professors Ismini Lourentzou and Yaoyao Liu, along with students from their labs, will present their research at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), held in Denver, Colorado, from June 3–7. CVPR is the flagship annual meeting of IEEE/CVF and PAMI-TC, where researchers present their latest advances in computer vision, pattern recognition, machine learning, robotics, and artificial intelligence, both in theory and practice. 

iSchool researchers to present at ChLA 2026

iSchool faculty and staff will present their research at the Children's Literature Association (ChLA) annual conference, which will be held from May 28-30 in Pittsburgh, Pennsylvania. The theme of this year's conference is "Neighbors and Neighborhoods in Children's Literature, Media, and Culture."

School of Information Sciences

501 E. Daniel St.

MC-493

Champaign, IL

61820-6211

Voice: (217) 333-3280

Email: ischool@illinois.edu

Back to top