Data Science

Using statistical and computational techniques to discover and extract knowledge from structured and unstructured data.

Researchers Working in this Area

Related Research Projects

Advancing STEM Online Learning by Augmenting Accessibility with Interactive Companionship

Time frame
2021-2024
Investigator
Yun Huang
Total funding to date
$526,006.00
Funding agency
National Science Foundation

Full title: Collaborative Research: Advancing STEM Online Learning by Augmenting Accessibility with Interactive Companionship

Videos are a popular option for online learning, and captions are essential for accessibility. Two types of captions exist: typical closed captions and explanatory captions. Closed captions…

interactive companionship

AEOLIAN (Artificial Intelligence for Cultural Organizations)

Time frame
2021-2023
Investigator
Glen Layne-Worthey
Total funding to date
$49,820.00
Funding agency
National Endowment for the Humanities

Many digital archival collections are limited due to factors such as privacy concerns and copyright. AEOLIAN combines innovative AI methods and the knowledge of scholars from multiple cultural institutions to address the accessibility of these collections, ultimately making them more accessible. Additionally, the project aims to foster collaboration amongst scholars and practitioners from…

artificial intelligence

AI Institute Artificial Intelligence for Future Agricultural Resilience Management and Sustainability (AIFARMS)

Time frame
2020-2022
Investigator
Jingrui He
Total funding to date
$7,909.00
Funding agency
National Science Foundation

This project brings together researchers in AI and agriculture, combining their expertise to promote advances in agriculture through AI. AIFARMS mission is to use core AI research areas such as computer vision, machine learning, data science, soft object manipulation, and intuitive human-robot interaction to address major challenges in agriculture:

  • Sustainable intensification…
AI farms

Automated Indexing for Publication Types and Study Design

Time frame
2023-Present
Investigators
Neil Smalheiser, Jodi Schneider, Halil Kilicoglu
Total funding to date
$947,925.00
Funding agency
National Institutes of Health

This project aims improve upon a tool clinicians, researchers, and systematic reviewers use to retrieve biomedical articles from bibliographic databases. Associate Professors Halil Kilicoglu and Jodi Schneider will work with Affiliate Professor Neil Smalheiser, professor of psychiatry at the University of Illinois Chicago, on the project, which has received funding from the National Institutes…

doctor typing on a laptop

BD Hubs: Collaborative Proposal: Midwest: Midwest Big Data Hub: Building Communities to Harness the Data Revolution

Time frame
2019-Present
Investigator
Catherine Blake
Total funding to date
$2,883,274.00
Funding agency
National Science Foundation

The Midwest Big Data Hub (MBDH) is a network of regional institutions created to facilitate collection, management, and use of complex information, creating active partnerships of experts and resources to address issues relevant to life in the Midwest.

MBDH aims to help member organizations working in Big Data coordinate…

Midwest Big Data Hub

Big Data-Theoretic Approach to Quantify Organizational Failure Mechanisms in Probabilistic Risk Assessment

Time frame
2015-Present
Investigators
Zahra Mohaghegh, Catherine Blake
Total funding to date
$899,663.00
Funding agency
National Science Foundation

Catastrophic events such as Fukushima and Katrina have made it clear that integrating physical and social causes of failure into a cohesive modeling framework is critical in order to prevent complex technological accidents and to maintain public safety and health. In this research, experts in Probabilistic Risk Assessment (PRA), Organizational Behavior and Information Science and Data…

CAREER: III: Modeling the Heterogeneity of Heterogeneity: Algorithms, Theories and Applications

Time frame
2019-Present
Investigator
Jingrui He
Total funding to date
$415,836.00
Funding agency
National Science Foundation

Nowadays, as an intrinsic property of big data, data heterogeneity can be seen in a variety of real world applications, ranging from security to manufacturing, from healthcare to crowdsourcing. Many high-impact data mining applications exhibit the co-existence of multiple types of heterogeneity, such as different classification tasks, different data sources, and different labeling oracles.…

modeling heterogeneity

Collaborative Research: Changes in Molecular Gas and Galaxy Properties Over Time in the Era of Integral Field Unit Surveys

Time frame
2016-Present
Investigator
Matthew Turk
Total funding to date
$267,664.00
Funding agency
National Science Foundation

Changes of a galaxy's properties over time are driven by the quantity of its cold gas, the raw material from which stars form. Thus, understanding the properties of a galaxy's cold gas component will tell us both how the star formation process changes over time and how this affects galaxies. Using a recently completed survey of carbon monoxide gas in a sample of nearby galaxies, the project…

galaxy

Collaborative Research: Accelerating Synthetic Biology Discovery & Exploration through Knowledge Integration

Time frame
2019-Present
Investigator
J. Stephen Downie
Total funding to date
$211,699.00
Funding agency
National Science Foundation

The scientific challenge for this project is to accelerate discovery and exploration of the synthetic biology design space. In particular, many parts used in synthetic biology come from or are initially tested in a simple bacteria, E. coli, but many potential applications in energy, agriculture, materials, and health require either different bacteria or higher level organisms (yeast for…

synthetic biology

Collaborative Research: CDS&E: Renaissance Simulations Laboratory to Model and Explore the First Galaxies in the Universe

Time frame
2016-Present
Investigator
Matthew Turk
Total funding to date
$171,469.00
Funding agency
National Science Foundation

A great challenge in astrophysics is to understand in detail how the initial smooth distribution of matter in the early Universe formed the first galaxies. Complementing observations of real galaxies, researchers use computational simulations to model the early Universe and study the results. This process allows one to learn how these first galaxies might have formed. However, the sheer size…

galaxies and universe

Collaborative Research: S12-SSI: Inquiry-Focused Volumetric Data Analysis Across Scientific Domains: Sustaining and Expanding the yt Community

Time frame
2017-Present
Investigator
Matthew Turk
Total funding to date
$1,061,721.00
Funding agency
National Science Foundation

Scientific discovery across the physical sciences is increasingly dependent on the analysis of volumetric - or three-dimensional - data, that may come from a supercomputer simulation, direct measurement, or mathematical models. Researchers typically seek to extract meaningful insights from this data by visualizing and analyzing it in various ways. The ways in which scientists process…

yt project

Data Storytelling Toolkit for Libraries (DSTL)

Time frame
2022-Present
Investigators
Kate McDowell, Matthew Turk
Total funding to date
$99,330.00
Funding agency
Institute of Museum and Library Services

The University of Illinois, in collaboration with multiple community college and public libraries, will develop a data storytelling toolkit for libraries (DSTL) to support meaningful and effective data communication, empowering libraries to utilize the rapidly growing data landscape. DSTL will connect real-world examples of data use with data stories (including narrative strategies and data…

internet tab on bookshelf

DeepCrowd: A Crowd-assisted Deep Learning-based Disaster Scene Assessment System with Active Human-AI Interactions

Time frame
2021-2023
Investigator
Dong Wang
Total funding to date
$499,786.00
Funding agency
National Science Foundation

Full title: CHS: Small: DeepCrowd: A Crowd-assisted Deep Learning-based Disaster Scene Assessment System with Active Human-AI Interactions

This project addresses the application of AI to disaster scene assessment (DSA). AI currently has limited success with DSA; this project…

DeepCrowd

Harnessing Artificial Intelligence for Cartel Smuggling Study

Time frame
2019-2020
Investigator
Jingrui He
Total funding to date
$78,917.00
Funding agency
Arizona State University and the Department of Homeland Security

This joint effort with Arizona State University’s CAOE team aims to create a suite of effective and efficient AI tools for analyzing cartel smuggling activities, building upon the team’s expertise in machine learning, data mining, and visual analytics.

U.S. Coast Guard

Identifying False HPV-Vaccine Information and Modeling Its Impact on Risk Perceptions

Time frame
2020-Present
Investigator
Jessie Chin
Total funding to date
$389,810.00
Funding agency
National Institutes of Health

Human papillomavirus (HPV) is the most common sexually transmitted infection in the U.S., with over 34,000 new HPV-related cancers diagnosed annually, according to the Centers for Disease Control and Prevention. An HPV vaccine, which was approved by the Food and Drug Administration (FDA) in 2006, is recommended as part of routine vaccinations for school-aged children. However, the vaccine's…

network

III: Small: Predictive Analysis of Diabetes Dedicated Social Networks

Time frame
2019-Present
Investigator
Jingrui He
Total funding to date
$448,049.00
Funding agency
National Science Foundation

This project will study diabetes dedicated social networks. It aims to harness diabetes patients' online social behaviors from multiple networks to predict their biomarker measurements such as glycated hemoglobin and fasting blood glucose. This project will provide a paradigm shift from exploration to prediction compared with state-of-the-art research on diabetes dedicated social networks,…

diabetes

Improving Patient Outcomes by Listening to Their Social Media Communications

Time frame
2017-2019
Investigator
Ian Brooks
Total funding to date
$15,000.00
Funding agency
Homecare Education Advocacy & Resource Team Support

It is difficult to understand the effectiveness of various treatment options when a huge number of external factors such as lifestyle, diet, and environment affect the burden of a disease. A major barrier to understanding is the challenge of scale—sampling enough patients to separate the major, minor, and negligible factors. With access to a database of more than one trillion public social…

social media icons

INDICATOR: An Information System for Monitoring the Health of a Community

Time frame
2007-2011
Investigator
Ian Brooks
Total funding to date
$300,723.00
Funding agency
Centers for Disease Control and Prevention, U.S. Department of Agriculture, Carle Foundation

INDICATOR is a novel information system for collecting, integrating, and analyzing data from multiple sources to provide public health decision makers real-time data on the health of their community. Data comes from sources as varied as emergency department visits, school attendance, veterinary clinics, and social media postings and together have been used to change public policy in outbreak…

INDICATOR

Innovation in an Aging Society

Time frame
2013-2020
Investigator
Vetle Torvik
Total funding to date
$569,272.00
Funding agency
National Bureau of Economic Research

The U.S. scientific workforce is aging - the average age of both US academics and medical school faculty increased to the late 40s, from the early 40s in 1970. This aging is troubling because people are seen to make important scientific contributions early in their careers. Moreover, the U.S. is turning to innovation as an economic driver, and the aging of the population will both increase and…

aging society

Machine Learning Modeling for the Reactivity of Organic Contaminants in Engineered and Natural Environments

Time frame
2021-Present
Investigator
Dong Wang
Total funding to date
$150,001.00
Funding agency
National Science Foundation

With support from the Environmental Chemical Sciences Program of the NSF Division of Chemistry, the researchers will develop machine learning models to predict the reactivity of thousands of organic contaminants (OCs) in engineered (water) and natural (soil and sediment) environments. To assess and mitigate risks associated with this vast number of OCs, accurate predictive models are needed to…

topography map

MAIDR: Multimodal Access and Interactive Data Representation

Time frame
2023-Present
Investigator
JooYoung Seo
Total funding to date
$649,921.00
Funding agency
Institute of Museum and Library Services

This project will research the needs of professional data curators and blind patrons. In partnership with the National Center for Supercomputing Applications, Posit Public Benefit Corporation, the Chart2Music open-source project team, the Data Curation Network, and the National Federation of the Blind, JooYoung Seo will develop a multimodal data representation system. Through needs assessments…

reading Braille

National Forum Data Mining Research Using In Copyright and Limited Access Text Datasets Shaping a Research and Implementation Agenda for Researchers Libraries and Content Providers

Time frame
2017-2019
Investigator
Bertram Ludäscher
Total funding to date
$99,536.00
Funding agency
Institute of Museum and Library Services

Copyright law and resource licensing complicate the application of text data mining for research. This project convened a National Forum on Text Data Mining with Use-Limited Data in April 2018 that brought together 25 leading stakeholders selected among researchers, librarians, content providers, legal experts, and representatives of scholarly societies to articulate an agenda that provides…

text mining icon

Natural Language Processing to Assess and Improve Citation Integrity in Biomedical Publications

Time frame
2022-Present
Investigators
Halil Kilicoglu, Jodi Schneider
Total funding to date
$300,000.00
Funding agency
Office of Research Integrity, U.S. Department of Health and Human Services

This project will assist researchers and journals in evaluating citation behavior in biomedical publications. While citations play a fundamental role in the diffusion of scientific knowledge and assessment of research on a topic, they are often inaccurate (e.g., citation of nonexistent findings, inappropriate interpretation). This inaccuracy undermines the integrity of scientific literature…

journals spines

Pathtracker: A smartphone-based system for mobile infectious disease detection and epidemiology

Time frame
2015-Present
Investigator
Ian Brooks
Total funding to date
$1,005,692.00
Funding agency
National Science Foundation

This project will develop a mobile sensor technology for performing detection and identification of viral and bacterial pathogens. By means of a smartphone-based detection instrument, the results are shared with a cloud-based data management service that will enable physicians to rapidly visualize the geographical and temporal spread of infectious disease. When deployed by a community of…

RareXplain: A Computational Framework for Explainable Rare Category Analysis

Time frame
2021-Present
Investigator
Jingrui He
Total funding to date
$500,000.00
Funding agency
National Science Foundation

This project will focus on real-world problems where underrepresented, rare (abnormal) examples play critical roles, such as defective silicon wafers resulting from a new semiconductor manufacturing process and rare but severe complications (e.g., kidney failure) among diabetes patients.

"This problem of explainable rare category analysis was motivated by my collaboration with IBM…

circuits

RIDIR: Collaborative Research: Developing and Deploying SKOPE - A Resource for Synthesizing Knowledge of Past Environments

Time frame
2016-Present
Investigator
Bertram Ludäscher
Total funding to date
$884,627.00
Funding agency
National Science Foundation

Recent research has demonstrated that investigations of contemporary societal problems can benefit from the use of long-term environmental data and from comparisons with cases in which the interactions of human societies with their environments is well-documented over centuries. By providing easy access to time- and place-specific long-term environmental data, this project seeks to facilitate…

Pueblo village

Single Interface for Music Score Searching and Analysis

Time frame
2015-Present
Investigator
J. Stephen Downie
Total funding to date
$15,000.00
Funding agency
Social Sciences and Humanities Research Council of Canada

Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.

Google Books and HathiTrust have already made it possible to search the content of text…

Smart Water Crowdsensing: Examining How Innovative Data Analytics and Citizen Science Can Ensure Safe Drinking Water in Rural Versus Suburban Communities

Time frame
2021-Present
Investigator
Dong Wang
Total funding to date
$1,031,655.00
Funding agency
National Science Foundation

Monitoring drinking water contamination is vitally important to inform consumers about water safety, identify source water problems, and facilitate discussion of public health and the environment of our drinking water. The overall goal of this project is to develop a framework for reliable and timely detection of drinking water contamination to build sustainable and connected communities. It…

glass being filled with water

Socio-technical Data Analytics (SODA) Education

Time frame
2012-2016
Investigator
Catherine Blake
Total funding to date
$498,777.00
Funding agency
Institute of Museum and Library Services

This project will create both a master’s and doctoral-level specialization in Socio-technical Data Analytics (SODA). Partnerships with local researchers and businesses who already work with large data-sets will enable master's graduates to receive first-hand experience with both the social and technical implications of large digital data collections, and thus be well-prepared for leadership…

Teach High School Students about Cybersecurity and AI Ethics via Empathy-Driven Hands-On Projects

Time frame
2021-2023
Investigators
Yang Wang, Yun Huang
Total funding to date
$154,754.00
Funding agency
National Science Foundation

Full title: Collaborative Research: Advancing STEM Online Learning by Augmenting Accessibility with Interactive Companionship

Videos are a popular option for online learning, and captions are essential for accessibility. Two types of captions exist: typical closed captions and explanatory captions. Closed captions…

empathy driven AI

The Whole Tale

Time frame
2016-Present
Investigators
Bertram Ludäscher, Matthew Turk
Total funding to date
$4,986,951.00
Funding agency
National Science Foundation

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along…

Towards a Computational Framework for Disinformation Trinity: Heterogeneity, Generation, and Explanation

Time frame
2020-Present
Investigator
Jingrui He
Total funding to date
$319,568.00
Funding agency
Arizona State University

This project will study foreign influence via the lens of disinformation on news media from a computational perspective. The researchers will use Explainable Heterogeneous Adversarial Machine Learning (EXHALE) to address the limitations of current techniques in terms of comprehension, characterization, and explainability.

code on a computer screen

Towards a Wearable Alcohol Biosensor: Examining the Accuracy of BAC Estimates from New-Generation Transdermal Technology using Large-Scale Human Testing and Machine Learning Algorithms

Time frame
2021-Present
Investigator
Nigel Bosch
Total funding to date
$21,267.00
Funding agency
National Institutes of Health

This NIH-funded project focuses on machine learning approaches for translating transdermal alcohol content (i.e., alcohol measured from a person’s skin) into blood alcohol content (“BAC”). Modern transdermal sensors are small, easy to use, and measure transdermal alcohol content frequently, but lag behind typical measures of BAC (especially breathalyzers) in terms of accuracy. This project…

wine glass on a table

Weakly Supervised Graph Neural Networks

Time frame
2021-Present
Investigator
Jingrui He
Total funding to date
$149,921.00
Funding agency
National Science Foundation

Graph Neural Networks have proven to be a powerful tool for harnessing graph data, which is widely used for representing rich relational information in multiple areas. However, the performance of graph neural networks largely depends on the amount of labeled data, which is subject to an expensive and time-consuming annotation process. This creates data without labels, or a label scarcity.…

graph on a computer screen

yt

Time frame
2012-2017
Investigator
Matthew Turk
Total funding to date
$2,000,000.00
Funding agency
National Science Foundation

The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions. To do so, it will encompass the creation of initial conditions, the execution of simulations, and the detailed exploration and visualization of the resultant data. It will also provide a standard framework based on physical quantities interoperability between codes…

News Stories