Downie to discuss HTRC findings at Harvard Library

Stephen Downie
J. Stephen Downie, Professor, Associate Dean for Research, and Co-Director of the HathiTrust Research Center

Professor and Associate Dean for Research J. Stephen Downie will present his recent work with the HathiTrust Research Center (HTRC) on April 30 at Harvard Library. Downie is codirector of HTRC, a collaboration between the University of Illinois, Indiana University, and the HathiTrust to enable advanced computational access to text found in the HathiTrust (HT) Digital Library.

His talk, "Creating Universal Open Access to Closed Textual Data at Scale: Use Cases from the HathiTrust Research Center," will discuss how the HTRC is creating a set of non-consumptive research services to make HT Digital Library volumes that are under copyright restrictions more open and useful to scholars.

"The creation and publication of the HTRC 'Extracted Features' (EF) dataset provides unigram counts and Part-of-Speech (POS) information for each of the 5.6 billion pages in the HT Digital Library," explained Downie. "In my talk, I will introduce two uses cases that leverage the EF dataset: the 'HathiTrust + Bookworm' visualization and analysis tool; and the Workset Building environment developed to provide researchers fine-grained access to the entire HT collection (both public domain and in-copyright) via the EF dataset."

Downie leads the HathiTrust + Bookworm text analysis project, which is creating tools to visualize the evolution of term usage over time. He also is the principal investigator on the Workset Creation for Scholarly Analysis + Data Capsules project, which integrates workset models and tools, and he represents the HTRC on the Novel(TM) text mining project as well as the Single Interface for Music Score Searching and Analysis project. All of these projects strive to provide large-scale analytic access to copyright-restricted cultural data.

Research Areas:
Updated on
Backto the news archive

Related News

iSchool participation in iConference 2025

The following iSchool faculty and students will participate in iConference 2025, which will be held virtually from March 11-14 and physically from March 18-22 in Bloomington, Indiana. The theme of this year's conference is "Living in an AI-gorithmic world."

Carboni joins the iSchool faculty

The iSchool is pleased to announce that Nicola Carboni has joined the faculty as an assistant professor. He previously served as a postdoctoral researcher and lecturer in digital humanities at the University of Geneva.

Nicola Carboni

Youth-AI-Safety named a winning team in international hackathon

A team of researchers from the SALT (Social Computing Systems) Lab has been selected as a winner in an international hackathon hosted by the Berkeley Center for Responsible, Decentralized Intelligence. The LLM Agents MOOC Hackathon brought together over 3,000 students, researchers, and practitioners from 127 countries to build and showcase innovative work in large language model (LLM) agents, grow the AI agent community, and advance LLM agent technology.

Chan to present "Predatory Data" work at named lectures

Associate Professor Anita Say Chan will present research drawn from her new book, Predatory Data: Eugenics in Big Tech and Our Fight for an Independent Future, at two named lectures this month. The lectures, which celebrate Women's History Month, will be held at the University of Minnesota and Carnegie Mellon University.

Anita Say Chan

New home for the Center for Children’s Books

The Center for Children's Books (CCB) at the iSchool is a crossroads for critical inquiry, professional training, and educational outreach related to youth-focused resources, literature, and librarianship. The CCB houses a non-circulating research collection of children’s and young adult books, with emphasis placed on books published within the last two years. The CCB recently moved to a new home in the iSchool building at 501 East Daniel Street. 

inside the Center for Children's Books with colorful furniture and carpet and bookcases.