Downie to discuss HTRC findings at Harvard Library

Stephen Downie
J. Stephen Downie, Professor and Associate Dean for Research

Professor and Associate Dean for Research J. Stephen Downie will present his recent work with the HathiTrust Research Center (HTRC) on April 30 at Harvard Library. Downie is codirector of HTRC, a collaboration between the University of Illinois, Indiana University, and the HathiTrust to enable advanced computational access to text found in the HathiTrust (HT) Digital Library.

His talk, "Creating Universal Open Access to Closed Textual Data at Scale: Use Cases from the HathiTrust Research Center," will discuss how the HTRC is creating a set of non-consumptive research services to make HT Digital Library volumes that are under copyright restrictions more open and useful to scholars.

"The creation and publication of the HTRC 'Extracted Features' (EF) dataset provides unigram counts and Part-of-Speech (POS) information for each of the 5.6 billion pages in the HT Digital Library," explained Downie. "In my talk, I will introduce two uses cases that leverage the EF dataset: the 'HathiTrust + Bookworm' visualization and analysis tool; and the Workset Building environment developed to provide researchers fine-grained access to the entire HT collection (both public domain and in-copyright) via the EF dataset."

Downie leads the HathiTrust + Bookworm text analysis project, which is creating tools to visualize the evolution of term usage over time. He also is the principal investigator on the Workset Creation for Scholarly Analysis + Data Capsules project, which integrates workset models and tools, and he represents the HTRC on the Novel(TM) text mining project as well as the Single Interface for Music Score Searching and Analysis project. All of these projects strive to provide large-scale analytic access to copyright-restricted cultural data.

Research Areas:
Updated on
Backto the news archive

Related News

Join the iSchool at ALISE 2019

Join iSchool faculty and students for the annual conference of the Association for Library and Information Science Education (ALISE), which will take place from September 24-26 in Knoxville, Tennessee. The theme of ALISE 2019 is "Exploring Learning in a Global Information Context." Dean and Professor Eunice E. Santos will provide welcoming remarks at the iSchool-sponsored School Representative's Breakfast at 7:30 a.m. on September 25.

iSchool faculty ranked as excellent for Summer 2019

Six iSchool instructors were named in the University's List of Teachers Ranked as Excellent for Summer 2019. The rankings are released every semester, and results are based on the Instructor and Course Evaluation System (ICES) questionnaire forms maintained by Measurement and Evaluation in the Center for Innovation in Teaching and Learning. pau

Underwood to discuss machine learning at Sawyer Seminar

Professor Ted Underwood will present his research on machine learning at the University of Pittsburgh on September 19. His talk is part of the University's Sawyer Seminar, a year-long project funded by The Andrew W. Mellon Foundation that brings together a diverse range of practitioners and disciplinary specialists to analyze the co-evolution of data and method across more than a century.

Ted Underwood

Chan presents research at 4S 2019

Associate Professor Anita Say Chan presented her research at the Annual Meeting of the Society for the Social Studies of Science (4S 2019), which took place in New Orleans on September 4-7. The Society is an international, nonprofit association that fosters interdisciplinary scholarship in social studies of science, technology, and medicine (a field often referred to as STS). The theme of this year's meeting was "Innovations, Interruptions, and Regenerations."

Anita Say Chan

Schneider discusses argumentation mining research

Assistant Professor Jodi Schneider presented her research on argumentation mining at a doctoral workshop at the University of Fribourg in Switzerland on September 2-3. Her lecture and tutorials were featured during the University’s Language and Cognition program’s “Linguistic and Corpus Perspectives on Argumentative Discourse” workshop. Schneider discussed problem definitions, corpora, and argument annotation for mining arguments from text. 

Jodi Schneider