Researchers rely on collections of books and other materials to support their scholarship. From these collections, scholars select, organize, and refine the worksets that will answer to their particular research objectives. The requirements for those worksets are becoming increasingly sophisticated and complex, both as humanities scholarship has become more interdisciplinary and as it has become more digital.
The HathiTrust Research Center (HTRC) is developing computational research access to some 10 million volumes (3 billion pages) to the HathiTrust corpus, a digital library of millions of books and other materials digitized by the Google Books project and other mass-digitization efforts. The HTRC is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library.
Given the unprecedented size and scope of the HathiTrust corpus—in conjunction with the HTRC’s unique computational access to copyrighted materials—the Workset Creation for Scholarly Analysis: Prototyping Project (WCSA) will engage scholars in designing tools for exploration, location, and analytic grouping of materials so they can routinely conduct computational scholarship at scale, based on meaningful worksets. WCSA will address three sets of tightly intertwined research questions regarding: (1) enriching the metadata in the HathiTrust corpus; (2) augmenting string-based metadata with URIs to leverage discovery and sharing through external services; and (3) formalizing the notion of collections and worksets in the context of the HTRC.
WCSA is directed by iSchool Associate Dean for Research J. Stephen Downie, professor and HTRC codirector; iSchool-affiliated faculty member Timothy W. Cole (MS '89), professor at the University of Illinois Library; and Beth A. Plale, professor at the Indiana University School of Informatics and Computing and HTRC codirector.