This project will create both a master’s and doctoral-level specialization in Socio-technical Data Analytics (SODA). Partnerships with local researchers and businesses who already work with large data-sets will enable master's graduates to receive first-hand experience with both the social and technical implications of large digital data collections, and thus be well-prepared for leadership roles in academic and corporate environments. Similarly, doctoral students will consider multiple stages of the information lifecycle, which will help to ensure that their research findings will generalize to a range of scholarly and business practices. Case studies from these partners will be incorporated into new courses that will initially be held on campus and will later be evolved to the School...
RESEARCHERS WORKING IN THIS AREA
RELATED RESEARCH PROJECTS
Institute of Museum and Library Services
National Center for Supercomputing Applications
Assistant Professor Jana Diesner a received an Faculty Fellowship and seed funding for her project, “Predictive Modeling for Impact Assessment,” from the National Center for Supercomputing Applications (NCSA). Diesner collaborates closely with NCSA scientists on the project, which builds on her work developing computational solutions to assess the impact of issue-focused information projects such as social justice documentaries and books. Her research team leverages big social data for this purpose and combines techniques from machine learning and natural language processing to identify a fine-grained set of impact factors from textual data sources such as news articles, reviews, and social media. This project aims to locate...
Films are produced, screened and perceived as part of a larger and continuously changing ecosystem that involves multiple stakeholders and themes. This project will measure the impact of social justice documentaries by capturing, modeling and analyzing the map of these stakeholders and themes in a systematic, scalable and analytically rigorous fashion. This solution will result in a validated, re-useable and end-user friendly methodology and technology that practitioners can use to assess the long-term impact of media productions beyond the number of people who have seen a screening or visited a webpage. Moreover, bringing the proposed computational methodology into a real-world application context can serve as a case-study for demonstrating the usability of this cutting-edge solution...
Social Sciences and Humanities Research Council of Canada
Music prints and manuscripts created over the past thousand years sit on the shelves of libraries and museums around the globe. As these organizations digitize their collections, images of these scores are increasingly accessible online. However, the musical content remains difficult to search.
Google Books and HathiTrust have already made it possible to search the content of text documents through Optical Character Recognition (OCR), which transforms digital images of texts into a symbolic representation that can be searched by computers. For digital images of musical scores, the analogous technology is Optical Music Recognition (OMR).
The research team is working to improve OMR technology so that computers can recognize the musical symbols in these images, enabling us...
INDICATOR is a novel information system for collecting, integrating, and analyzing data from multiple sources to provide public health decision makers real-time data on the health of their community. Data comes from sources as varied as emergency department visits, school attendance, veterinary clinics, and social media postings and together have been used to change public policy in outbreak events.
Funding for this project was provided by the Carle Foundation, Centers for Disease Control and Prevention, and the U.S. Department of Agriculture.
National Science Foundation
Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible...
Big Data-Theoretic Approach to Quantify Organizational Failure Mechanisms in Probabilistic Risk Assessment
National Science Foundation
Catastrophic events such as Fukushima and Katrina have made it clear that integrating physical and social causes of failure into a cohesive modeling framework is critical in order to prevent complex technological accidents and to maintain public safety and health. In this research, experts in Probabilistic Risk Assessment (PRA), Organizational Behavior and Information Science and Data Analytics disciplines collaborate to provide answers to the following key questions: what social and organizational factors affect technical system risk; how and why do these factors influence risk; and how much do they contribute to risk? In addition to scientific contributions to organizational science, PRA, and data analytics, this research provides regulatory and industry decision-makers with...
Korea Institute of Science and Technology Information
How do limitations and intransparencies in data quality and data provenance bias research outcomes, and how can we detect and mitigate these limitations? For example, we have been investigating the impact of entity resolution errors on network analysis results. We found that commonly reported network metrics and derived implications can strongly deviate from the truth—as established based on gold standard data or approximations thereof—depending on the efforts dedicated to entity resolution.
How can we use user-generated content to construct, infer or refine network data? We have been tackling this problem by leveraging communication content produced and disseminated in social networks to enhance graph data. For example, we have used domain-adjusted sentiment analysis to label graphs with valence values in order to enable triadic balance assessment. The resulting method enables fast and systematic sign detection, eliminates the need for surveys or manual link labeling, and reduces issues with leveraging user-generated (meta)-data.
IN THE NEWS
Assistant Professor Jana Diesner and Professor Ted Underwood will present at Cultural Analytics 2017, a symposium devoted to new research in the fields of computational and data-intensive cultural studies, which will be held at the University of Notre Dame on May 26-27.
Diesner will give the talk, "Impact Assessment of Information Products and Data Provenance," on May 26. Her talk explores the question of how we can assess the impact of information products on people beyond relying on count metrics and by analyzing the substance of user-generated content. Diesner also addresses how limitations with the collection, quality, and provenance of large-scale social interaction data impact research outcomes and how we can measure these effects.
From the abstract: I present our work on developing new computational solutions for identifying the impact of information products on people by leveraging...
The iSchool and University Library are partners on a National Leadership Grant for Libraries awarded by the Institute of Museum and Library Services (IMLS). The grant supports work to hold a national forum and develop a white paper aimed at simplifying scholars' access to in-copyright and access-restricted texts for computational analysis and data mining research.
Text data mining and analysis are important research methods for scholars. However, efforts to access and analyze data sets are frequently complicated when texts are protected by copyright or other intellectual property restrictions.
The forum will bring together stakeholders in the areas of libraries, research, and publishing to discuss and recommend a research, policy, and practice framework that guides scholarly access to protected texts for data mining and other analyses. Thereafter, the grant partners will produce a white paper to summarize the discussions and present best practices and policy...
Jodi Schneider (MS '08), assistant professor, is the recipient of a start-up allocation award from the Extreme Science and Engineering Discovery Environment (XSEDE). XSEDE is a project of the National Science Foundation that provides researchers with access to the world’s most advanced and powerful collection of integrated digital resources and services.
The award will support Schneider's research in biomedical informatics. The goal of her project is to make sense of large-scale networks of knowledge in biomedical literature. Her underlying code and data are provided by collaborators at the National Library of Medicine, who used text mining to process data from NLM's PubMed/MEDLINE to create a new database, SemMedDB.
"SemMedDB is a database with 'predications' like Drug X treats Disease Y. We consider this as a semantic network with drugs as vertices and relationships (e.g., treats) as edges. You can think of the...
Assistant Professor Jana Diesner will discuss current issues with open science that involve human-centered and online data and her related research at the Open Science Conference 2017, which will be held March 21-22 in Berlin. The Open Science 2017 Conference is the fourth international conference of the Leibniz Research Alliance Science 2.0, which addresses changes in science and the science system that are related to new forms of participation, communication, collaboration, and open discourse now possible through the web.
This year's conference will focus on open educational resources—course materials (print and digital), modules, streaming videos, software, and other tools, materials, or techniques used to support open access to knowledge. It will offer presentations by international experts, including Diesner, as well as a poster session, a panel discussion, and workshops.
Doctoral student Shadi Rezapour and Assistant Professor Jana Diesner will present a paper at the 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017), which will be held February 25-March 1 in Portland, Oregon. CSCW brings together experts from industry and academia to explore the technical, social, material, and theoretical challenges of designing technology to support collaborative work and life activities.
Rezapour and Diesner will present, "Classification and Detection of Micro-Level Impact of Issue-Focused Films based on Reviews."
Abstract: We present novel research at the intersection of review mining and impact assessment of issue-focused information products, namely documentary films. We develop and evaluate a theoretically grounded classification schema, related codebook, corpus annotation, and prediction model for detecting multiple types of impact that...
Join Blake GIles, Manager at Research Park Operations for (IMO) Intelligent Medical Objects to hear more about the work they do and summer internship opportunities available at IMO. Online students are welcome to join us online.
Doctoral student Shadi Rezapour and Assistant Professor Jana Diesner will present a paper at The 11th IEEE International Conference on Semantic Computing (ICSC 2017), which will be held January 30 through February 1 in San Diego, California. ICSC 2017 provides an international forum for researchers and practitioners in academia and industry to present research that advances the state of semantic computing and identifies emerging research topics.
Rezapour and Diesner will present, "Identifying the Overlap between Election Result and Candidates' Ranking based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis." The paper's coauthors include Lufan Wang (Department of Civil and Environmental Engineering) and Omid Abdar (Department of Linguistics).
Abstract: The popularity and availability of Twitter as a service and a data source have fueled the interest in sentiment analysis. Previous research has shed...