Data Analytics Subscribe to Data Analytics


Institute of Museum and Library Services

This project will create both a master’s and doctoral-level specialization in Socio-technical Data Analytics (SODA). Partnerships with local researchers and businesses who already work with large data-sets will enable master's graduates to receive first-hand experience with both the social and technical implications of large digital data collections, and thus be well-prepared for leadership roles in academic and corporate environments. Similarly, doctoral students will consider multiple stages of the information lifecycle, which will help to ensure that their research findings will generalize to a range of scholarly and business practices. Case studies from these partners will be incorporated into new courses that will initially be held on campus and will later be evolved to the School...

National Center for Supercomputing Applications

Assistant Professor Jana Diesner a received an Faculty Fellowship and seed funding for her project, “Predictive Modeling for Impact Assessment,” from the National Center for Supercomputing Applications (NCSA). Diesner collaborates closely with NCSA scientists on the project, which builds on her work developing computational solutions to assess the impact of issue-focused information projects such as social justice documentaries and books. Her research team leverages big social data for this purpose and combines techniques from machine learning and natural language processing to identify a fine-grained set of impact factors from textual data sources such as news articles, reviews, and social media. This project aims to locate...

Ford Foundation

Films are produced, screened and perceived as part of a larger and continuously changing ecosystem that involves multiple stakeholders and themes. This project will measure the impact of social justice documentaries by capturing, modeling and analyzing the map of these stakeholders and themes in a systematic, scalable and analytically rigorous fashion. This solution will result in a validated, re-useable and end-user friendly methodology and technology that practitioners can use to assess the long-term impact of media productions beyond the number of people who have seen a screening or visited a webpage. Moreover, bringing the proposed computational methodology into a real-world application context can serve as a case-study for demonstrating the usability of this cutting-edge solution...

Social Sciences and Humanities Research Council of Canada

Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.

Google Books and HathiTrust have already made it possible to search the content of text documents through Optical Character Recognition (OCR), which transforms digital images of texts into a symbolic representation that can be searched by computers. For digital images of musical scores, the analogous technology is Optical Music Recognition (OMR).

The research team is working to improve OMR technology so that computers can recognize the musical symbols in these images, enabling us...


INDICATOR is a novel information system for collecting, integrating, and analyzing data from multiple sources to provide public health decision makers real-time data on the health of their community. Data comes from sources as varied as emergency department visits, school attendance, veterinary clinics, and social media postings and together have been used to change public policy in outbreak events.

Funding for this project was provided by the Carle Foundation, Centers for Disease Control and Prevention, and the U.S. Department of Agriculture.

National Science Foundation

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible...


Feb. 3, 2017

Join Blake GIles, Manager at Research Park Operations for (IMO) Intelligent Medical Objects to hear more about the work they do and summer internship opportunities available at IMO. Online students are welcome to join us online

Jan. 30, 2017

Doctoral student Shadi Rezapour and Assistant Professor Jana Diesner will present a paper at The 11th IEEE International Conference on Semantic Computing (ICSC 2017), which will be held January 30 through February 1 in San Diego, California. ICSC 2017 provides an international forum for researchers and practitioners in academia and industry to present research that advances the state of semantic computing and identifies emerging research topics.

Rezapour and Diesner will present, "Identifying the Overlap between Election Result and Candidates' Ranking based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis." The paper's coauthors include Lufan Wang (Department of Civil and Environmental Engineering) and Omid Abdar (Department of Linguistics).

Abstract: The popularity and availability of Twitter as a service and a data source have fueled the interest in sentiment analysis. Previous research has shed...

Jan. 24, 2017

J. Stephen Downie, professor and associate dean for research, participated in the Center for Open Data in the Humanities (CODH) seminar, "Big Data and Digital Humanities," on January 23 at the National Institute of Informatics in Tokyo, Japan.

Started in April 2016, the CODH will be formally established as a center in April 2017. It involves faculty from the National Institute of Informatics and The Institute of Statistical Mathematics, both in Japan, who collaborate with computer scientists and humanities scholars around the globe. CODH promotes research and development to improve access to humanities data, using the concept of open science along with the latest technology in informatics and statistics.

Downie gave the presentation, "Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center":

The HathiTrust Digital...

Dec. 12, 2016

By using products such as soap, shampoo, body lotion, toothpaste and makeup, the average consumer may be exposed to dozens of chemicals each day. It's not easy, though, to know exactly what is in many consumer products or what potential risks they pose, either individually or in combination.

A doctoral student and a professor in the University of Illinois School of Information Sciences are using an informatics approach to help prioritize chemical combinations for further testing by determining the prevalence of individual ingredients and their most likely combinations in consumer products.

Doctoral student Henry Gabb and professor Catherine Blake published the results of the first phase of their work in Environmental Health Perspectives, a journal of the National Institute of Environmental Health Sciences, part of the National Institutes of Health...

Dec. 9, 2016

Doctoral students Ming Jiang and Shubhanshu Mishra will present research papers at the 26th International Conference on Computational Linguistics (COLING), which will be held December 11-16 in Osaka, Japan. The COLING conference, held every two years, is one of the top international conferences in the field of natural language processing and computational linguistics, which covers research topics such as question answering, text summarization, information extraction, discourse structure, and more. 

Jiang will present a paper coauthored with Assistant Professor Jana Diesner titled, "Says Who...? Identification of Expert versus Layman Critics’ Reviews of Documentary Films."

Abstract: We extend classic review mining work by building a binary classifier that predicts whether a review of a documentary film was written by an expert or a...

Dec. 5, 2016

Unique in its sheer size and breadth, a new open dataset released by the HathiTrust Research Center (HTRC) will provide researchers with access to otherwise restricted information. The HTRC Extracted Features (EF) Dataset reports quantitative counts of words, lines, parts of speech, and other details extracted from each page of the more than thirteen million volumes found in the HathiTrust Digital Library. 

An earlier release of the EF Dataset, drawn from a subset covering only the five million volumes in HathiTrust's public domain collection, has enabled novel research from scholars in economics, history, linguistics, literary studies, and sociology, among other fields. The new EF dataset, released under a Creative Commons Attribution license, provides access to features drawn from the remaining eight million volumes that otherwise would be...

Oct. 7, 2016

Assistant Professor Vetle Torvik has been named the iSchool's Centennial Scholar for 2016-2017. The Centennial Scholar award is endowed by alumni and friends of the School and given in recognition of outstanding accomplishments and/or professional promise in the field of library and information science.

Torvik expressed surprise and gratitude at receiving this honor. "I am in awe of colleagues who received it before me; their caliber is off the charts," he said. "I hope to use the award to open new doors—a stamp of approval from colleagues who know you well goes a long way to establish new collaborations necessary to solve the increasingly complex problems facing science and society today.”

Torvik joined the faculty in 2011. His current research addresses problems related to scientific discovery and collaboration using complex models and large-scale bibliographic databases. He is the author of articles in journals such as Proceedings of the National Academy of...