Data Analytics Subscribe to Data Analytics

RELATED RESEARCH PROJECTS

soda-ring
Institute of Museum and Library Services

This project will create both a master’s and doctoral-level specialization in Socio-technical Data Analytics (SODA). Partnerships with local researchers and businesses who already work with large data-sets will enable master's graduates to receive first-hand experience with both the social and technical implications of large digital data collections, and thus be well-prepared for leadership roles in academic and corporate environments. Similarly, doctoral students will consider multiple stages of the information lifecycle, which will help to ensure that their research findings will generalize to a range of scholarly and business practices. Case studies from these partners will be incorporated into new courses that will initially be held on campus and will later be evolved to the School...

impact_assessment
National Center for Supercomputing Applications

Assistant Professor Jana Diesner a received an Faculty Fellowship and seed funding for her project, “Predictive Modeling for Impact Assessment,” from the National Center for Supercomputing Applications (NCSA). Diesner collaborates closely with NCSA scientists on the project, which builds on her work developing computational solutions to assess the impact of issue-focused information projects such as social justice documentaries and books. Her research team leverages big social data for this purpose and combines techniques from machine learning and natural language processing to identify a fine-grained set of impact factors from textual data sources such as news articles, reviews, and social media. This project aims to locate...

documentary
Ford Foundation

Films are produced, screened and perceived as part of a larger and continuously changing ecosystem that involves multiple stakeholders and themes. This project will measure the impact of social justice documentaries by capturing, modeling and analyzing the map of these stakeholders and themes in a systematic, scalable and analytically rigorous fashion. This solution will result in a validated, re-useable and end-user friendly methodology and technology that practitioners can use to assess the long-term impact of media productions beyond the number of people who have seen a screening or visited a webpage. Moreover, bringing the proposed computational methodology into a real-world application context can serve as a case-study for demonstrating the usability of this cutting-edge solution...

8q3vkd9liu
Social Sciences and Humanities Research Council of Canada

Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.

Google Books and HathiTrust have already made it possible to search the content of text documents through Optical Character Recognition (OCR), which transforms digital images of texts into a symbolic representation that can be searched by computers. For digital images of musical scores, the analogous technology is Optical Music Recognition (OMR).

The research team is working to improve OMR technology so that computers can recognize the musical symbols in these images, enabling us...

indicator_logo_1

INDICATOR is a novel information system for collecting, integrating, and analyzing data from multiple sources to provide public health decision makers real-time data on the health of their community. Data comes from sources as varied as emergency department visits, school attendance, veterinary clinics, and social media postings and together have been used to change public policy in outbreak events.

Funding for this project was provided by the Carle Foundation, Centers for Disease Control and Prevention, and the U.S. Department of Agriculture.

whole_tale
National Science Foundation

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible...

fukushima
National Science Foundation

Catastrophic events such as Fukushima and Katrina have made it clear that integrating physical and social causes of failure into a cohesive modeling framework is critical in order to prevent complex technological accidents and to maintain public safety and health. In this research, experts in Probabilistic Risk Assessment (PRA), Organizational Behavior and Information Science and Data Analytics disciplines collaborate to provide answers to the following key questions: what social and organizational factors affect technical system risk; how and why do these factors influence risk; and how much do they contribute to risk? In addition to scientific contributions to organizational science, PRA, and data analytics, this research provides regulatory and industry decision-makers with...

sky-1513292_1280
Korea Institute of Science and Technology Information

How do limitations and intransparencies in data quality and data provenance bias research outcomes, and how can we detect and mitigate these limitations? For example, we have been investigating the impact of entity resolution errors on network analysis results. We found that commonly reported network metrics and derived implications can strongly deviate from the truth—as established based on gold standard data or approximations thereof—depending on the efforts dedicated to entity resolution.

finger-769300_1920

How can we use user-generated content to construct, infer or refine network data? We have been tackling this problem by leveraging communication content produced and disseminated in social networks to enhance graph data. For example, we have used domain-adjusted sentiment analysis to label graphs with valence values in order to enable triadic balance assessment. The resulting method enables fast and systematic sign detection, eliminates the need for surveys or manual link labeling, and reduces issues with leveraging user-generated (meta)-data. 

yt_logo
yt
National Science Foundation

The yt project aims to produce an integrated science environment for collaboratively asking and answering astrophysical questions. To do so, it will encompass the creation of initial conditions, the execution of simulations, and the detailed exploration and visualization of the resultant data. It will also provide a standard framework based on physical quantities interoperability between codes.

Development of yt is driven by a commitment to Open Science principles as manifested in participatory development, reproducibility, documented and approachable code, a friendly and helpful community of users and developers, and Free and Libre Open Source Software.

pathtracker
National Science Foundation

This project will develop a mobile sensor technology for performing detection and identification of viral and bacterial pathogens. By means of a smartphone-based detection instrument, the results are shared with a cloud-based data management service that will enable physicians to rapidly visualize the geographical and temporal spread of infectious disease. When deployed by a community of medical users (such as veterinarians or point-of-care clinicians), the PathTracker system will enable rapid determination and reporting of instances of infectious disease that can inform treatment and quarantine responses that are currently not possible with tests performed at central laboratory facilities. 

Immediate uses for the technology are for diagnosis of viral infection in human...

graph-dingo
Korea Institute of Science and Technology Information

The project team will work on extracting key concepts from scholarly publications and explore techniques for building a taxonomy of extracted concepts by leveraging open knowledge bases (e.g., Wikipedia). The outcome of this process will be evaluated for various science and technology knowledge platform-based analysis services. The techniques, which reduce semantic ambiguity, will analyze conceptual novelty and expertise of researchers / research institutes across time, leading to a better understanding of the evolution of scientific domains in a scholarly community. The research will also lead to the development of open source tools to allow this research work to be replicated.

IN THE NEWS

Jun. 1, 2018
guan_0

PhD student Yingjun Guan will present his research at the Conference on Statistical Learning and Data Science / Nonparametric Statistics, which will be held June 4-6 at Columbia University. The conference brings together researchers in statistical machine learning and data mining from academia, industry, and government to discuss topics such as big data analytics, classification, learning theory, network analysis, and signal and image processing. The Department of Statistics at the University of Illinois is a cosponsor of the event.

Guan will present his poster, "IMDB Review Mining and Movie Recommendation," in which he examines how movie- and user-related information extracted from the Internet Movie Database (IMDB) can be used to predict movie ratings. 

"Our project plans to benefit both the movie production corporations and the audience," Guan explained. "Data mining and supervised machine...

May. 18, 2018

If you love to talk about data management, data curation, and data analysis, we'd love to chat with you at Data & Drinks, the second summer professional networking event from the iSchool at Illinois. We aim to provide a space  for central Illinois residents and visitors who work with data to meet colleagues in the field and have productive conversations about our challenges, ideas, and projects.

The iSchool will provide tasty snacks and the venue will have a cash bar for your convenience. Please register to attend in advance of the event.

Apr. 17, 2018
diesner_1


Assistant Professor Jana Diesner is the program co-chair of the 3rd International Workshop on Social Sensing (SocialSens 2018). The workshop will be held on April 17 in Orlando, in conjunction with the ACM/IEEE International Conference on Internet of Things Design and Implementation (IoTDI 2018).

SocialSens 2018 will bring together researchers and engineers from academia, industry, and government to present recent advances in social sensing, as described on the website:

Social sensing has emerged as a new paradigm for collecting sensory measurements by means of "crowd-sourcing" sensory data collection tasks to a human population. Humans can act as sensor carriers (e.g., carrying GPS devices that share location data), sensor operators (e.g., taking pictures with smart phones), or as sensors themselves (e.g., sharing their observations on Twitter). The proliferation of...

Apr. 16, 2018
imls-data-forum

A group of cross-disciplinary experts gathered in Chicago on April 5 and 6 for a national forum on text data mining research. The forum, Data Mining Research Using In-copyright and Limited-access Text Datasets, was coordinated by iSchool faculty and staff and funded by the Institute of Museum and Library Services (grant LG-73-17-0070-17). 

Principal Investigator Bertram Ludäscher, professor and director of the iSchool’s Center for Informatics Research in Science and Scholarship (CIRSS), led the effort with co-principal investigators Megan Senseney, CIRSS research scientist; Beth Sandore Namachchivaya, university librarian at the University of Waterloo; and investigator Eleanor Dickson, visiting digital humanities...

Mar. 23, 2018
schneider-sq_0

Assistant Professor Jodi Schneider, CAS student Janina Sarol (MSIM '17), and undergraduate Linxi Liu will discuss their research at the European Conference on Information Retrieval (ECIR 2018) in Grenoble, France. Sarol will present their paper, "Testing a Citation and Text-Based Framework for Retrieving Publications for Literature Reviews," at the conference’s Bibliometric-enhanced Information Retrieval workshop on March 26.

Using the framework they created, the researchers collected articles that were connected in the citation network and filtered them using a combination of citation- and text-based criteria. Their paper discusses how well their framework performed in its first implementation, compared to conventional search methods of six published systematic reviews.

"Using different combinations of seed articles, we were able to retrieve up to eighty-seven percent of the total included studies in the...

Jan. 4, 2018
jodischneider-sq_0

Assistant Professor Jodi Schneider (MS ’08) has received funding from the National Institutes of Health to develop a series of automated informatics tools for reviewing medical literature more quickly and easily. The project, “Text Mining Pipeline to Accelerate Systematic Reviews in Evidence-Based Medicine,” was funded through a subaward from the University of Illinois at Chicago that will cover $228,006 in direct costs. Schneider is co-principal investigator with Neil Smalheiser, associate professor of psychiatry at UIC, and Aaron Cohen, a professor in the Oregon Health & Science University’s Department of Medical Informatics and Clinical Epidemiology.

The team is currently testing three informatics tools: a meta-search engine for finding articles in medical literatures across different databases; an automated randomized control trial (RCT) tagger for identifying human randomized controlled clinical trial articles; and an aggregator tool that clusters together RCT...

Nov. 30, 2017
cblake-sq

Associate Professor Catherine Blake has been named the iSchool's Centennial Scholar for 2017-2018. The award is endowed by alumni and friends of the School and given in recognition of outstanding accomplishments and/or professional promise in information sciences.

A leading researcher in text mining medical literature, Blake has returned from a year as a faculty fellow at the Lister Hill National Center for Biomedical Communications, a research and development unit of the National Library of Medicine at the National Institutes of Health (NIH). There she worked on projects in semantic knowledge representation and medical ontology research.

Blake's earlier focus on how people synthesize evidence from literature directly informs her computational approaches to accelerate scientific discovery. She utilizes her industrial experience as a software developer, formal training in information and computer science, and more than a decade of experience in text mining scientific...

Pages