Data Analytics Subscribe to Data Analytics


Institute of Museum and Library Services

This project will create both a master’s and doctoral-level specialization in Socio-technical Data Analytics (SODA). Partnerships with local researchers and businesses who already work with large data-sets will enable master's graduates to receive first-hand experience with both the social and technical implications of large digital data collections, and thus be well-prepared for leadership roles in academic and corporate environments. Similarly, doctoral students will consider multiple stages of the information lifecycle, which will help to ensure that their research findings will generalize to a range of scholarly and business practices. Case studies from these partners will be incorporated into new courses that will initially be held on campus and will later be evolved to the School...

National Center for Supercomputing Applications

Assistant Professor Jana Diesner a received an Faculty Fellowship and seed funding for her project, “Predictive Modeling for Impact Assessment,” from the National Center for Supercomputing Applications (NCSA). Diesner collaborates closely with NCSA scientists on the project, which builds on her work developing computational solutions to assess the impact of issue-focused information projects such as social justice documentaries and books. Her research team leverages big social data for this purpose and combines techniques from machine learning and natural language processing to identify a fine-grained set of impact factors from textual data sources such as news articles, reviews, and social media. This project aims to locate...

Ford Foundation

Films are produced, screened and perceived as part of a larger and continuously changing ecosystem that involves multiple stakeholders and themes. This project will measure the impact of social justice documentaries by capturing, modeling and analyzing the map of these stakeholders and themes in a systematic, scalable and analytically rigorous fashion. This solution will result in a validated, re-useable and end-user friendly methodology and technology that practitioners can use to assess the long-term impact of media productions beyond the number of people who have seen a screening or visited a webpage. Moreover, bringing the proposed computational methodology into a real-world application context can serve as a case-study for demonstrating the usability of this cutting-edge solution...

Social Sciences and Humanities Research Council of Canada

Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.

Google Books and HathiTrust have already made it possible to search the content of text documents through Optical Character Recognition (OCR), which transforms digital images of texts into a symbolic representation that can be searched by computers. For digital images of musical scores, the analogous technology is Optical Music Recognition (OMR).

The research team is working to improve OMR technology so that computers can recognize the musical symbols in these images, enabling us...


INDICATOR is a novel information system for collecting, integrating, and analyzing data from multiple sources to provide public health decision makers real-time data on the health of their community. Data comes from sources as varied as emergency department visits, school attendance, veterinary clinics, and social media postings and together have been used to change public policy in outbreak events.

Funding for this project was provided by the Carle Foundation, Centers for Disease Control and Prevention, and the U.S. Department of Agriculture.

National Science Foundation

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible...

National Science Foundation

Catastrophic events such as Fukushima and Katrina have made it clear that integrating physical and social causes of failure into a cohesive modeling framework is critical in order to prevent complex technological accidents and to maintain public safety and health. In this research, experts in Probabilistic Risk Assessment (PRA), Organizational Behavior and Information Science and Data Analytics disciplines collaborate to provide answers to the following key questions: what social and organizational factors affect technical system risk; how and why do these factors influence risk; and how much do they contribute to risk? In addition to scientific contributions to organizational science, PRA, and data analytics, this research provides regulatory and industry decision-makers with...

Korea Institute of Science and Technology Information

How do limitations and intransparencies in data quality and data provenance bias research outcomes, and how can we detect and mitigate these limitations? For example, we have been investigating the impact of entity resolution errors on network analysis results. We found that commonly reported network metrics and derived implications can strongly deviate from the truth—as established based on gold standard data or approximations thereof—depending on the efforts dedicated to entity resolution.


How can we use user-generated content to construct, infer or refine network data? We have been tackling this problem by leveraging communication content produced and disseminated in social networks to enhance graph data. For example, we have used domain-adjusted sentiment analysis to label graphs with valence values in order to enable triadic balance assessment. The resulting method enables fast and systematic sign detection, eliminates the need for surveys or manual link labeling, and reduces issues with leveraging user-generated (meta)-data. 

National Science Foundation

The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions. To do so, it will encompass the creation of initial conditions, the execution of simulations, and the detailed exploration and visualization of the resultant data. It will also provide a standard framework based on physical quantities interoperability between codes.

Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.


Aug. 31, 2017

Members of the Whole Tale Archaeology Working Group will meet with fellow computational archaeologists, environmental scientists, and other researchers for the first "Prov-a-thon" on practical tools for reproducible science. Held in conjunction with the DataONE All-Hands Meeting in Santa Ana Pueblo, New Mexico, the two-day workshop on August 31 and September 1 is cosponsored by the NSF-funded projects Whole Tale, DataONE, and the Arctic Data Center.

The goal of the workshop is to expose scientists to existing and emerging provenance tools from DataONE, Whole Tale, and other projects (e.g., SKOPE),  and conversely, to gather feedback, new requirements, and new ideas for effective uses of provenance from the scientific community. The first day of the workshop...

Aug. 25, 2017

Thanks to a new online resource for paleoenvironmental data and models under development at Illinois and partner institutions, historian Richard Flint can gauge whether environmental factors played an important role in driving the migration of Pueblo Indians from the Spanish province of New Mexico in the seventeenth century. Using SKOPE (Synthesizing Knowledge of Past Environments), scholars such as Flint and the larger community of archaeologists will be able to discover, explore, visualize, and synthesize knowledge of environments in the recent or remote past.

"We are aiming to support different types of users—from researchers asking fundamental questions in the historical social sciences using climate retrodictions from tree-ring...

Aug. 10, 2017

The yt project, an open science environment created to address astrophysical questions through analysis and visualization, has been awarded a $1.6 million dollar grant from the National Science Foundation (NSF) to continue developing their software project. This grant will enable yt to expand and begin to support other domains beyond astrophysics, including weather, geophysics and seismology, molecular dynamics, and observational astronomy. It will also support the development of curricula for Data Carpentry, to ease the onramp for scientists new to data from these domains.

iSchool Assistant Professor Matthew Turk is leading the project with Nathan Goldbaum, Kacper Kowalik, and Meagan Lang of the National Center for Supercomputing Applications (NCSA) and in collaboration with Ben Holtzman at Columbia University in the City of New York and Leigh Orf at the...

Jul. 24, 2017

Assistant Professor Matthew Turk is partnering on a project to help resolve the growing gap between food supply and demand in the face of global climate change. Led by Amy Marshall-Colón, principal investigator and assistant professor of plant biology, Crops in silico (Cis) will integrate a suite of virtual plant models at different scales through $274,000 in funding from The Foundation for Food and Agriculture Research (FFAR), a nonprofit organization that builds unique partnerships to support innovative and actionable science addressing today's food and agriculture challenges. The FFAR grant matches seed funding the project has received from the Institute for Sustainability,...

May. 24, 2017

Assistant Professor Jana Diesner and Professor Ted Underwood will present at Cultural Analytics 2017, a symposium devoted to new research in the fields of computational and data-intensive cultural studies, which will be held at the University of Notre Dame on May 26-27.

Diesner will give the talk, "Impact Assessment of Information Products and Data Provenance," on May 26. Her talk explores the question of how we can assess the impact of information products on people beyond relying on count metrics and by analyzing the substance of user-generated content. Diesner also addresses how limitations with the collection, quality, and provenance of large-scale social interaction data impact research outcomes and how we can measure these effects. 

From the abstract: I present our work on developing new computational solutions for identifying the impact of information products on people by leveraging...

Apr. 26, 2017

The iSchool and University Library are partners on a National Leadership Grant for Libraries awarded by the Institute of Museum and Library Services (IMLS). The grant supports work to hold a national forum and develop a white paper aimed at simplifying scholars' access to in-copyright and access-restricted texts for computational analysis and data mining research.

Text data mining and analysis are important research methods for scholars. However, efforts to access and analyze data sets are frequently complicated when texts are protected by copyright or other intellectual property restrictions.

The forum will bring together stakeholders in the areas of libraries, research, and publishing to discuss and recommend a research, policy, and practice framework that guides scholarly access to protected texts for data mining and other analyses. Thereafter, the grant partners will produce a white paper to summarize the discussions and present best practices and policy...

Apr. 3, 2017

Jodi Schneider (MS '08), assistant professor, is the recipient of a start-up allocation award from the Extreme Science and Engineering Discovery Environment (XSEDE). XSEDE is a project of the National Science Foundation that provides researchers with access to the world’s most advanced and powerful collection of integrated digital resources and services.

The award will support Schneider's research in biomedical informatics. The goal of her project is to make sense of large-scale networks of knowledge in biomedical literature. Her underlying code and data are provided by collaborators at the National Library of Medicine, who used text mining to process data from NLM's PubMed/MEDLINE to create a new database, SemMedDB.

"SemMedDB is a database with 'predications' like Drug X treats Disease Y. We consider this as a semantic network with drugs as vertices and relationships (e.g., treats) as edges. You can think of the...