Information Retrieval Subscribe to Information Retrieval

RESEARCHERS WORKING IN THIS AREA

RELATED RESEARCH PROJECTS

HT%2BBW%20Promo%20%233
National Endowment for the Humanities

The HathiTrust Research Center (HTRC) is partnering with the Cultural Observatory team that developed the Google Books Ngram Viewer together with Google. The goal of this collaboration is to implement a greatly enhanced open-source version of the Cultural Observatory’s open-source “Bookworm” text analysis and visualization tool designed to assist scholars to meet the challenges posed by the massive scale of the HT corpus. We are calling our multi-disciplinary, multi-institutional collaboration, the HathiTrust + Bookworm (HT+BW) Project. Participating institutions include the University of Illinois, Indiana University, Northeastern University, Baylor College of Medicine, and Rice University.

Bookworm is a tool that visualizes language usage trends in repositories of...

hamlet_yorick-e1446232890153_0
Andrew W. Mellon Foundation

This project, conducted collaboratively by the iSchool and the University Library, will further our understanding of four translational research questions:

  1. As compared to general collection catalog records, item-level metadata for digitized special collections are frequently more granular, richer in non-bibliographic entities, and expressed using custom vocabularies and schemas. What differences and additional challenges are encountered when transforming legacy special collections metadata records into LOD?
  2. Typically interfaces used to discover and view digitized special collections are disconnected from the online public access catalogs and ancillary services used to provide user access to general library collections. Can LOD reconnect library special and...
search_literacy
Google

Despite the ubiquity of search in many people’s daily lives, a lack of search literacy can make it difficult to find solutions to technical problems, such as completing software-based tasks like troubleshooting program installations. iSchool Professor Michael Twidale and Assistant Professor Max Wilson of the University of Nottingham have received funding from Google for a project that aims to develop an understanding of search literacy, and to recommend best practices for teaching technical search literacy and creating tools in support of this kind of search.

8q3vkd9liu
Social Sciences and Humanities Research Council of Canada

Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.

Google Books and HathiTrust have already made it possible to search the content of text documents through Optical Character Recognition (OCR), which transforms digital images of texts into a symbolic representation that can be searched by computers. For digital images of musical scores, the analogous technology is Optical Music Recognition (OMR).

The research team is working to improve OMR technology so that computers can recognize the musical symbols in these images, enabling us...

htrc_new
HathiTrust

The HathiTrust has provided funding for the HathiTrust Research Center (HTRC), colocated at University of Illinois and Indiana University, to serve as the research arm of the HathiTrust and create an agile, technology-rich service for researchers in the digital humanities, social sciences, natural sciences, and informatics. This service will help researchers conduct nonconsumptive research on the HathiTrust digital library database, a collection of just under 14 million digitized volumes, equating to 4.9 billion pages, 60% of which is under some copyright restriction. At the same time, center staff will develop and refine tools to aid in digital humanities and text mining research over large databases and will operate the secure, large-scale computation environment required by this...

whole_tale
National Science Foundation

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible...

htrcnew3-wh
Andrew W. Mellon Foundation

This project builds upon, extends, and integrates two developmental research threads within the HathiTrust Research Center (HTRC). The first thread originates from work that was conducted in the Workset Collections for Scholarly Analysis (WCSA): Prototyping Project. The second thread continues the work of the Data Capsules (DC) project, previously supported by the Alfred P. Sloan Foundation (2011-2014). The primary objective of the WCSA+DC project is the seamless integration of the workset model and tools with the Data Capsule framework to provide non-consumptive research access to HathiTrust's massive corpus of data objects, securely and at...

IN THE NEWS

Aug. 31, 2017

Members of the Whole Tale Archaeology Working Group will meet with fellow computational archaeologists, environmental scientists, and other researchers for the first "Prov-a-thon" on practical tools for reproducible science. Held in conjunction with the DataONE All-Hands Meeting in Santa Ana Pueblo, New Mexico, the two-day workshop on August 31 and September 1 is cosponsored by the NSF-funded projects Whole Tale, DataONE, and the Arctic Data Center.

The goal of the workshop is to expose scientists to existing and emerging provenance tools from DataONE, Whole Tale, and other projects (e.g., SKOPE),  and conversely, to gather feedback, new requirements, and new ideas for effective uses of provenance from the scientific community. The first day of the workshop...

May. 16, 2017
willis-sq

Doctoral student Craig Willis has received funding from the National Institutes of Health (NIH) to work with the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE)/DataMED team on a pilot project this summer. The award is based on his participation in the 2016 bioCADDIE Dataset Retrieval Challenge, which had the objective of creating innovative ways for biomedical researchers to search and discover biomedical research data.

"bioCADDIE is an NIH Big Data to Knowledge (BD2K) project to develop the DataMed [2] system, sometimes described as the 'PubMed of data,'" said Willis. "At the end of the challenge, they awarded two subcontracts. The goal of my project is to prototype and evaluate expansion models for integration into the DataMed...

Jul. 13, 2016

Associate Professor Miles Efron will participate in the 39th International Conference of the Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR). The conference will be held July 17-21 in Pisa, Italy.

Efron and his doctoral students Craig Willis and Garrick Sherman will present the short paper, “What Makes a Query Temporally Sensitive.”

From the abstract: This work takes an in-depth look at the factors that affect manual classifications of “temporally sensitive” information needs. We use qualitative and quantitative techniques to analyze 660 topics from the Text Retrieval Conference (TREC) previously used in the experimental evaluation of temporal retrieval models. Regression analysis is used to model previous manual classifications. We identify factors and potential problems with previous classifications, proposing principles and guidelines for future work on...

Mar. 1, 2016

Every month, Google alone fields billions of search requests. The staggering demand for information, coupled with the exponentially growing amount of information available, means that reliable search results are key to maneuvering a flooded information landscape.

Associate Professor Miles Efron is among the leading scholars investigating ways to improve search. With funded research projects supported by the National Science Foundation as well as by industry partners such as Google, he looks at the issue from a variety of angles, including questions of query representation and how temporal factors affect the relationship between queries and relevant information.

Though his research is thick with writing code and creating algorithms, Efron approaches his work through the lens of a humanist, incorporating his academic background in classics and medieval studies. “My goal is to translate familiar humanist concerns and see how they resonate in the kinds of domains that...

Feb. 23, 2016
Tim Cole, right, mathematics librarian, is helping develop tools so scholars like Ted Underwood, left, can use computational analysis to answer research questions. J. Stephen Downie, center, is the project lead. Photo by Joyce Seay-Knoblauch.

Illinois English professor Ted Underwood wants to know how the language describing male and female characters in works of fiction has changed since the late eighteenth century. He’s using data mining tools to gather information from thousands of books to answer that question.

The problem, though, is that books published after 1922 are still under copyright protection and their content can’t be shared freely online.

“There are hundreds of thousands of books out there, and we don’t talk about them,”...

Feb. 17, 2016

GSLIS master’s students Jessica Colbert and Annabella Irvine will participate in the Midwest Bisexual Lesbian Gay Transgender Ally College Conference (MBLGTACC) this week, where they will lead a workshop on locating LGBT materials in libraries and will represent the GSLIS student group, Queer Library Alliance (QLA). Held annually, the interdisciplinary conference is organized by students and is the largest event of its kind in the country. MBLGTACC 2016 will be held at Purdue University on February 19-21.

The workshop, "Finding Ourselves in the Library: Locating LGBT Materials in Libraries,” was developed by Colbert and fellow GSLIS MS/LIS student Brittany Craig.

Abstract: This workshop serves to assist students in pursuit of queer literatures, histories, and other LGBTQIA-related library materials. Libraries have been a hub for marginalized populations, particularly...

Jan. 25, 2016
efron_sq

Associate Professor Miles Efron has been named the GSLIS Centennial Scholar for 2015-2016. The Centennial Scholar award is endowed by alumni and friends of GSLIS and given in recognition of outstanding accomplishments and/or professional promise in the field of library and information science.

“This is a real honor. One of the things that makes GSLIS a great academic home is the excellence and intellectual diversity of our faculty. To be recognized in this way by colleagues whom I really admire is so gratifying. I give my strongest thanks to the GSLIS faculty for this recognition and support of my work,” Efron said.

“This award will help me to continue organizing GSLIS’s ongoing participation in the annual Text Retrieval Conference (TREC), hosted by the National Institute of Standards and Technology. It will also afford me a much-welcomed freedom to pursue a project in the digital humanities—analyzing data from the HathiTrust—that I have had on the back burner for a...

Pages