Downie to discuss HTRC findings at Harvard Library

Stephen Downie
J. Stephen Downie, Professor, Associate Dean for Research, and Co-Director of the HathiTrust Research Center

Professor and Associate Dean for Research J. Stephen Downie will present his recent work with the HathiTrust Research Center (HTRC) on April 30 at Harvard Library. Downie is codirector of HTRC, a collaboration between the University of Illinois, Indiana University, and the HathiTrust to enable advanced computational access to text found in the HathiTrust (HT) Digital Library.

His talk, "Creating Universal Open Access to Closed Textual Data at Scale: Use Cases from the HathiTrust Research Center," will discuss how the HTRC is creating a set of non-consumptive research services to make HT Digital Library volumes that are under copyright restrictions more open and useful to scholars.

"The creation and publication of the HTRC 'Extracted Features' (EF) dataset provides unigram counts and Part-of-Speech (POS) information for each of the 5.6 billion pages in the HT Digital Library," explained Downie. "In my talk, I will introduce two uses cases that leverage the EF dataset: the 'HathiTrust + Bookworm' visualization and analysis tool; and the Workset Building environment developed to provide researchers fine-grained access to the entire HT collection (both public domain and in-copyright) via the EF dataset."

Downie leads the HathiTrust + Bookworm text analysis project, which is creating tools to visualize the evolution of term usage over time. He also is the principal investigator on the Workset Creation for Scholarly Analysis + Data Capsules project, which integrates workset models and tools, and he represents the HTRC on the Novel(TM) text mining project as well as the Single Interface for Music Score Searching and Analysis project. All of these projects strive to provide large-scale analytic access to copyright-restricted cultural data.

Research Areas:
Updated on
Backto the news archive

Related News

Debnath datafies "The Bulletin"

MSIM student Tan Debnath, whose interests span data mining, statistical modeling, text mining, and digital humanities, joined the Center for Children's books as a research assistant. He was tasked with building curation processes that would datafy seventy-five years' worth of archival issues of The Bulletin of the Center for Children's Books, one of the nation's leading children's book review journals.

Tan Debnath stands casually with his hands in his pockets and smiles broadly at the camera. It's a sunny day

iSchool researchers to present at CHI 2025

iSchool faculty and students will present their research at the ACM Conference on Human Factors in Computing Systems (CHI 2025), which will be held from April 26 to May 1 in Yokohama, Japan. 

Undergraduate Research Symposium features iSchool students and mentors

Several iSchool undergraduate students will participate in the 18th annual Undergraduate Research Symposium. During the event, visitors will learn about undergraduate research projects through oral and poster presentations, creative performances, and art exhibits. All are welcome to attend the symposium, which will be held on April 24 from 9:00 a.m.-5:00 p.m. in the Illini Rooms and South Lounge of the Illini Union. Oral presentations will be held on the second floor of the Illini Union.

Wang wins grand prize at Research Live!

Informatics PhD student Olivia Wang won the Grand Prize at the 2025 Research Live! competition, which was held on April 8 in the Campus Instructional Facility Atrium. At the event, which is hosted by the Graduate College, thirteen finalists presented their graduate research in three minutes or less to a general audience. Wang received $500 as the Grand Prize winner.

Olivia Wang