HTRC Subscribe to HTRC

IN THE NEWS

Aug. 4, 2017

The iSchool at Illinois is involved in a partnership that has received a research grant from the Institute of Museum and Library Services for an extension of the Data Capsule service, which enables remote access by the HathiTrust Digital Library to other collections managed by research libraries. The partnership is led by the School of Informatics and Computing at Indiana University. 
  
As the volume of digital content has expanded exponentially over the past several years, researchers and educators have recognized the potential of big data techniques to analyze, access, and organize digital scholarly collections. The Data Capsule service, which was developed for use in the HathiTrust Research Center (HTRC), creates virtual computers for users to access a restricted collection. Within HTRC, the Data Capsule service is used for non-consumptive analytics, which allow the computer to analyze the text but doesn’...

Jun. 12, 2017

The iSchool is co-organizing a workshop on digital scholarship with Beijing Institute of Technology (BIT) Library on June 14-16 in Beijing. The workshop, Digital Scholarship Centers: Building Library Services for Data-Driven Scholarship, will instruct participants in library service models for digital scholarship and discuss concepts in digital humanities and computational social science. Dean Allen Renear will give opening remarks. Other iSchool presenters include J. Stephen Downie, professor and codirector of the HathiTrust Research Center (HTRC); Peter Organisciak (PhD '15), postdoctoral research associate; Eleanor Dickson, visiting HTRC digital humanities specialist; and Nic Weber (PhD '15), assistant professor at the University of Washington.

Downie will give the talks:

  • "Text Mining Concepts and Methods: HTRC and Non-Consumptive Research"
  • "Quick and Painless Introduction to Machine Learning"
  • "WEKA Machine Learning Tools: A Friendly...
May. 30, 2017
downie_square_crop

J. Stephen Downie, professor and associate dean for research, has been named a National Center for Supercomputing Applications (NCSA) Faculty Fellowship awardee for the 2017-18 academic year. Faculty Fellows work with NCSA on specific projects aimed to help solve grand challenges facing all people, including deep learning, the internet of things, data analysis, volcano activity, and more.

Downie’s project is titled, “Modeling the Massive HathiTrust Corpus: Creating Concept-Based Representations of 15 Million Volumes.” Through this research, he hopes to make the HathiTrust collection—15 million books spanning multiple centuries—available for large-scale research use through optimized,...

Jan. 24, 2017

J. Stephen Downie, professor and associate dean for research, participated in the Center for Open Data in the Humanities (CODH) seminar, "Big Data and Digital Humanities," on January 23 at the National Institute of Informatics in Tokyo, Japan.

Started in April 2016, the CODH will be formally established as a center in April 2017. It involves faculty from the National Institute of Informatics and The Institute of Statistical Mathematics, both in Japan, who collaborate with computer scientists and humanities scholars around the globe. CODH promotes research and development to improve access to humanities data, using the concept of open science along with the latest technology in informatics and statistics.

Downie gave the presentation, "Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center":

The HathiTrust Digital...

Dec. 5, 2016

Unique in its sheer size and breadth, a new open dataset released by the HathiTrust Research Center (HTRC) will provide researchers with access to otherwise restricted information. The HTRC Extracted Features (EF) Dataset reports quantitative counts of words, lines, parts of speech, and other details extracted from each page of the more than thirteen million volumes found in the HathiTrust Digital Library. 

An earlier release of the EF Dataset, drawn from a subset covering only the five million volumes in HathiTrust's public domain collection, has enabled novel research from scholars in economics, history, linguistics, literary studies, and sociology, among other fields. The new EF dataset, released under a Creative Commons Attribution license, provides access to features drawn from the remaining eight million volumes that otherwise would be...

May. 5, 2016

Who influenced Charles Darwin when he was writing his pioneering theory of evolution, On the Origin of Species? Indiana University (IU) professor Colin Allen wants to know, and the HathiTrust Research Center may now hold the answer.

The HathiTrust Research Center (HTRC), a cooperative service of Indiana University, the University of Illinois, and HathiTrust, has expanded its services to support computational research on the entire collection of one of the world’s largest digital libraries, held by HathiTrust. HathiTrust’s collections include over 14 million digitized volumes, including more than 7 million books, more than 725,000 US federal government documents, and more than 350,000 serial publications. HathiTrust’s collections are drawn from some of the largest research libraries in North America, including Indiana University and the University of...

Feb. 23, 2016
Tim Cole, right, mathematics librarian, is helping develop tools so scholars like Ted Underwood, left, can use computational analysis to answer research questions. J. Stephen Downie, center, is the project lead. Photo by Joyce Seay-Knoblauch.

Illinois English professor Ted Underwood wants to know how the language describing male and female characters in works of fiction has changed since the late eighteenth century. He’s using data mining tools to gather information from thousands of books to answer that question.

The problem, though, is that books published after 1922 are still under copyright protection and their content can’t be shared freely online.

“There are hundreds of thousands of books out there, and we don’t talk about them,”...

Pages