HTRC Subscribe to HTRC

IN THE NEWS

Nov. 21, 2017
downie-square

Professor and Associate Dean for Research J. Stephen Downie was a keynote speaker for the 7th Rizal Library International Conference, which was held from November 16-18 at Ateneo de Manila University in Quezon City, Philippines. The theme of the conference was "CLICK! Connecting Libraries, Information, and Community Knowledge."

Downie gave the presentation, "HathiTrust Research Center: Text mining the very big data of the HathiTrust Digital Library." HathiTrust Digital Library is a partnership of more than 100 university and public libraries, which has amassed a collection of over 15 million volumes and 5.5 billion pages. While researchers are applying data mining and text analysis techniques to reveal new knowledge buried within the collection, roughly 10 million volumes are under copyright restrictions and cannot be shared directly with researchers.

In his talk, Downie, codirector of the...

Oct. 25, 2017

The HathiTrust Research Center (HTRC) will host its 2018 UnCamp on January 25-26 at the University of California, Berkeley. The primary venue will be the newly renovated Moffitt Library with breakout events in nearby campus locations, including the Berkeley Institute for Data Science, Morrison Library, D-Lab in Barrows Hall, and Academic Innovation Studio. Registration is now open.

UnCamp brings together digital humanities researchers and tool developers as well as librarians and graduate students. It combines hands-on coding and demonstrations, inspirational use-case studies, lightning talks, and breakout sessions—all structured in the dynamic setting of a participant-driven, unconference programming format. This year's event will feature keynote presentations about the IMLS-funded project Aida (Image...

Aug. 4, 2017

The iSchool at Illinois is involved in a partnership that has received a research grant from the Institute of Museum and Library Services for an extension of the Data Capsule service, which enables remote access by the HathiTrust Digital Library to other collections managed by research libraries. The partnership is led by the School of Informatics and Computing at Indiana University. 
  
As the volume of digital content has expanded exponentially over the past several years, researchers and educators have recognized the potential of big data techniques to analyze, access, and organize digital scholarly collections. The Data Capsule service, which was developed for use in the HathiTrust Research Center (HTRC), creates virtual computers for users to access a restricted collection. Within HTRC, the Data Capsule service is used for non-consumptive analytics, which allow the computer to analyze the text but doesn’...

Jun. 12, 2017

The iSchool is co-organizing a workshop on digital scholarship with Beijing Institute of Technology (BIT) Library on June 14-16 in Beijing. The workshop, Digital Scholarship Centers: Building Library Services for Data-Driven Scholarship, will instruct participants in library service models for digital scholarship and discuss concepts in digital humanities and computational social science. Dean Allen Renear will give opening remarks. Other iSchool presenters include J. Stephen Downie, professor and codirector of the HathiTrust Research Center (HTRC); Peter Organisciak (PhD '15), postdoctoral research associate; Eleanor Dickson, visiting HTRC digital humanities specialist; and Nic Weber (PhD '15), assistant professor at the University of Washington.

Downie will give the talks:

  • "Text Mining Concepts and Methods: HTRC and Non-Consumptive Research"
  • "Quick and Painless Introduction to Machine Learning"
  • "WEKA Machine Learning Tools: A Friendly...
May. 30, 2017
downie_square_crop

J. Stephen Downie, professor and associate dean for research, has been named a National Center for Supercomputing Applications (NCSA) Faculty Fellowship awardee for the 2017-18 academic year. Faculty Fellows work with NCSA on specific projects aimed to help solve grand challenges facing all people, including deep learning, the internet of things, data analysis, volcano activity, and more.

Downie’s project is titled, “Modeling the Massive HathiTrust Corpus: Creating Concept-Based Representations of 15 Million Volumes.” Through this research, he hopes to make the HathiTrust collection—15 million books spanning multiple centuries—available for large-scale research use through optimized,...

Jan. 24, 2017

J. Stephen Downie, professor and associate dean for research, participated in the Center for Open Data in the Humanities (CODH) seminar, "Big Data and Digital Humanities," on January 23 at the National Institute of Informatics in Tokyo, Japan.

Started in April 2016, the CODH will be formally established as a center in April 2017. It involves faculty from the National Institute of Informatics and The Institute of Statistical Mathematics, both in Japan, who collaborate with computer scientists and humanities scholars around the globe. CODH promotes research and development to improve access to humanities data, using the concept of open science along with the latest technology in informatics and statistics.

Downie gave the presentation, "Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center":

The HathiTrust Digital...

Dec. 5, 2016

Unique in its sheer size and breadth, a new open dataset released by the HathiTrust Research Center (HTRC) will provide researchers with access to otherwise restricted information. The HTRC Extracted Features (EF) Dataset reports quantitative counts of words, lines, parts of speech, and other details extracted from each page of the more than thirteen million volumes found in the HathiTrust Digital Library. 

An earlier release of the EF Dataset, drawn from a subset covering only the five million volumes in HathiTrust's public domain collection, has enabled novel research from scholars in economics, history, linguistics, literary studies, and sociology, among other fields. The new EF dataset, released under a Creative Commons Attribution license, provides access to features drawn from the remaining eight million volumes that otherwise would be...

Pages