This project will create both a master’s and doctoral-level specialization in Socio-technical Data Analytics (SODA). Partnerships with local researchers and businesses who already work with large data-sets will enable master's graduates to receive first-hand experience with both the social and technical implications of large digital data collections, and thus be well-prepared for leadership roles in academic and corporate environments. Similarly, doctoral students will consider multiple stages of the information lifecycle, which will help to ensure that their research findings will generalize to a range of scholarly and business practices. Case studies from these partners will be incorporated into new courses that will initially be held on campus and will later be evolved to the School...
RESEARCHERS WORKING IN THIS AREA
RELATED RESEARCH PROJECTS
Institute of Museum and Library Services
National Center for Supercomputing Applications
Assistant Professor Jana Diesner a received an Faculty Fellowship and seed funding for her project, “Predictive Modeling for Impact Assessment,” from the National Center for Supercomputing Applications (NCSA). Diesner collaborates closely with NCSA scientists on the project, which builds on her work developing computational solutions to assess the impact of issue-focused information projects such as social justice documentaries and books. Her research team leverages big social data for this purpose and combines techniques from machine learning and natural language processing to identify a fine-grained set of impact factors from textual data sources such as news articles, reviews, and social media. This project aims to locate...
Films are produced, screened and perceived as part of a larger and continuously changing ecosystem that involves multiple stakeholders and themes. This project will measure the impact of social justice documentaries by capturing, modeling and analyzing the map of these stakeholders and themes in a systematic, scalable and analytically rigorous fashion. This solution will result in a validated, re-useable and end-user friendly methodology and technology that practitioners can use to assess the long-term impact of media productions beyond the number of people who have seen a screening or visited a webpage. Moreover, bringing the proposed computational methodology into a real-world application context can serve as a case-study for demonstrating the usability of this cutting-edge solution...
Social Sciences and Humanities Research Council of Canada
Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.
Google Books and HathiTrust have already made it possible to search the content of text documents through Optical Character Recognition (OCR), which transforms digital images of texts into a symbolic representation that can be searched by computers. For digital images of musical scores, the analogous technology is Optical Music Recognition (OMR).
The research team is working to improve OMR technology so that computers can recognize the musical symbols in these images, enabling us...
INDICATOR is a novel information system for collecting, integrating, and analyzing data from multiple sources to provide public health decision makers real-time data on the health of their community. Data comes from sources as varied as emergency department visits, school attendance, veterinary clinics, and social media postings and together have been used to change public policy in outbreak events.
Funding for this project was provided by the Carle Foundation, Centers for Disease Control and Prevention, and the U.S. Department of Agriculture.
National Science Foundation
Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible...
Big Data-Theoretic Approach to Quantify Organizational Failure Mechanisms in Probabilistic Risk Assessment
National Science Foundation
Catastrophic events such as Fukushima and Katrina have made it clear that integrating physical and social causes of failure into a cohesive modeling framework is critical in order to prevent complex technological accidents and to maintain public safety and health. In this research, experts in Probabilistic Risk Assessment (PRA), Organizational Behavior and Information Science and Data Analytics disciplines collaborate to provide answers to the following key questions: what social and organizational factors affect technical system risk; how and why do these factors influence risk; and how much do they contribute to risk? In addition to scientific contributions to organizational science, PRA, and data analytics, this research provides regulatory and industry decision-makers with...
IN THE NEWS
Assistant Professor Jana Diesner will discuss current issues with open science that involve human-centered and online data and her related research at the Open Science Conference 2017, which will be held March 21-22 in Berlin. The Open Science 2017 Conference is the fourth international conference of the Leibniz Research Alliance Science 2.0, which addresses changes in science and the science system that are related to new forms of participation, communication, collaboration, and open discourse now possible through the web.
This year's conference will focus on open educational resources—course materials (print and digital), modules, streaming videos, software, and other tools, materials, or techniques used to support open access to knowledge. It will offer presentations by international experts, including Diesner, as well as a poster session, a panel discussion, and workshops.
Doctoral student Shadi Rezapour and Assistant Professor Jana Diesner will present a paper at the 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017), which will be held February 25-March 1 in Portland, Oregon. CSCW brings together experts from industry and academia to explore the technical, social, material, and theoretical challenges of designing technology to support collaborative work and life activities.
Rezapour and Diesner will present, "Classification and Detection of Micro-Level Impact of Issue-Focused Films based on Reviews."
Abstract: We present novel research at the intersection of review mining and impact assessment of issue-focused information products, namely documentary films. We develop and evaluate a theoretically grounded classification schema, related codebook, corpus annotation, and prediction model for detecting multiple types of impact that...
Join Blake GIles, Manager at Research Park Operations for (IMO) Intelligent Medical Objects to hear more about the work they do and summer internship opportunities available at IMO. Online students are welcome to join us online.
Doctoral student Shadi Rezapour and Assistant Professor Jana Diesner will present a paper at The 11th IEEE International Conference on Semantic Computing (ICSC 2017), which will be held January 30 through February 1 in San Diego, California. ICSC 2017 provides an international forum for researchers and practitioners in academia and industry to present research that advances the state of semantic computing and identifies emerging research topics.
Rezapour and Diesner will present, "Identifying the Overlap between Election Result and Candidates' Ranking based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis." The paper's coauthors include Lufan Wang (Department of Civil and Environmental Engineering) and Omid Abdar (Department of Linguistics).
Abstract: The popularity and availability of Twitter as a service and a data source have fueled the interest in sentiment analysis. Previous research has shed...
J. Stephen Downie, professor and associate dean for research, participated in the Center for Open Data in the Humanities (CODH) seminar, "Big Data and Digital Humanities," on January 23 at the National Institute of Informatics in Tokyo, Japan.
Started in April 2016, the CODH will be formally established as a center in April 2017. It involves faculty from the National Institute of Informatics and The Institute of Statistical Mathematics, both in Japan, who collaborate with computer scientists and humanities scholars around the globe. CODH promotes research and development to improve access to humanities data, using the concept of open science along with the latest technology in informatics and statistics.
Downie gave the presentation, "Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center":
The HathiTrust Digital...
By using products such as soap, shampoo, body lotion, toothpaste and makeup, the average consumer may be exposed to dozens of chemicals each day. It's not easy, though, to know exactly what is in many consumer products or what potential risks they pose, either individually or in combination.
A doctoral student and a professor in the University of Illinois School of Information Sciences are using an informatics approach to help prioritize chemical combinations for further testing by determining the prevalence of individual ingredients and their most likely combinations in consumer products.
Doctoral student Henry Gabb and professor Catherine Blake published the results of the first phase of their work in Environmental Health Perspectives, a journal of the National Institute of Environmental Health Sciences, part of the National Institutes of Health...
Doctoral students Ming Jiang and Shubhanshu Mishra will present research papers at the 26th International Conference on Computational Linguistics (COLING), which will be held December 11-16 in Osaka, Japan. The COLING conference, held every two years, is one of the top international conferences in the field of natural language processing and computational linguistics, which covers research topics such as question answering, text summarization, information extraction, discourse structure, and more.
Jiang will present a paper coauthored with Assistant Professor Jana Diesner titled, "Says Who...? Identification of Expert versus Layman Critics’ Reviews of Documentary Films."
Abstract: We extend classic review mining work by building a binary classifier that predicts whether a review of a documentary film was written by an expert or a...