University of Illinois Extension

The Illinois Digital Innovation Leadership Program will increase opportunities for entrepreneurship, economic development, and innovation through the expansion of digital manufacturing, digital media production, and data analytics. Supported by the University of Illinois Extension, the project will engage Illinoisans with mobile digital design and innovation labs, or “DigiTech Hubs,” which will serve as high-tech inventor workshops equipped with tools for everything from audio production to 3D printing. Digital Innovation Leadership staff will work with 4-H clubs, public libraries, and public schools to develop permanent community-based and -supported studios, creating a network that will build statewide capacity in digital...

National Science Foundation

The goal of this research is to help researchers develop and use relatively simple tools to describe species in a way that make those descriptions easier to share with other scientists and easier for computers to process and analyze. The approach is bottom-up and iterative, involving the rapid prototyping of tools, combining of existing tools, and the tailoring of applications developed for one purpose but now being reused for this scientific activity. Innovation from this project is applicable to the long-term development of open source software initiatives serving labs throughout the world. The project provides rich, real-world training for graduate students in library and information sciences, training them to be much needed cross-disciplinary researchers in a field desperate for...

National Science Foundation

Data Observation Network for Earth (DataONE) is a collaborative, global project that is laying the groundwork for a new, innovative approach to conducting environmental science research. DataONE is a distributed framework and sustainable infrastructue poised to resolve many of the key challenges that hinder the realization of more global, open, and reproducible science, through four interrelated cyberinfrastructure (CI) activities:

  • significantly expanding the volume and diversity of data available to researchers for large-scale scientific innovation and discovery;
  • incorporating innovative and high-value science-enabling features into the DataONE CI;
  • maintaining and improving core software and...
National Science Foundation

Taxonomists are scientists who describe the world’s biodiversity. These descriptions of millions of species allow scientists to do many different kinds of research, including basic biology, environmental science, climate research, agriculture, and medicine. The problem is that describing any one species is not easy. The language used by taxonomists to describe their data is complex, and typically not easily understandable by computers nor even other scientists. This situation makes it harder to search for patterns across millions of species documented by thousands of researchers over many decades of work worldwide.

The goal of this research is to help researchers develop and use relatively simple tools to describe species in a way that makes those descriptions easier to share...

Institute of Museum and Library Services

The focus of this three-year, multisite project is development of app-based curricula and tools for use in school and public libraries. These tools will teach children aged eight to twelve how to build their own apps, providing them with early programming experience, and allow them to share their creations with other children. The project further establishes libraries as places to engage youth in STEM exploration and digital development that reflects their own experiences.

This project builds on a project conducted with support from a planning-phase grant from the Institute of Museum and Library Services titled, "Closing the App Gap." 

“The App Authors project is an exciting expansion...

Intelligent Medical Objects

Diesner’s team is developing a natural-language processing solution for probabilistic entity detection and classification in the domain of healthcare. The core of the solution are prediction models built by using supervised and/or semi-supervised machine learning techniques. The resulting models can be used to annotate natural language text data documents for entity classes. The team will perform fact extraction from medical text data documents as well as map tokens to predefined medical codes. Both tasks involve the same steps: 1) building and evaluating prediction models, 2) helping to integrate the prediction models into IMO’s workflow, 3) building an inference engine for practical applications, 4) building a technical solution with which IMO can update the prediction models, and 5...

Institute of Museum and Library Services

Across the country, colleges and universities are struggling to meet demand for accessible forms of course materials for students with an array of disabilities. At present, each institution is addressing this problem individually, at great expense, and often without full campus coordination, much less consortial collaboration. Locating digital files is difficult and entails numerous sources. The resulting accessibility enhancement/conversion work creates a large corpus of digital files in varying forms to manage on each campus. Over the course of one year, this planning project will bring together experts from disability/accessibility services with librarians, IT professionals, advocates, and legal counsel, to develop shared infrastructure within which universities can support their...

Andrew W. Mellon Foundation

“Understanding the Needs of Scholars in a Contemporary Publishing Environment,” better know as Publishing Without Walls (PWW), is a digital scholarly publishing initiative that is scholar-driven, openly accessible, scalable, and sustainable. PWW will directly engage with scholars throughout the research process. It aims to build publishing models that can be supported locally by a university’s library, while also opening new avenues toward publication through university presses and other publishers. PWW is here to help scholars navigate the new opportunities presented by collaborative, multimodal, and interim phase works. PWW is launching two new series: one focusing on the outcomes of the Humanities Without...


The HathiTrust has provided funding for the HathiTrust Research Center (HTRC), colocated at University of Illinois and Indiana University, to serve as the research arm of the HathiTrust and create an agile, technology-rich service for researchers in the digital humanities, social sciences, natural sciences, and informatics. This service will help researchers conduct nonconsumptive research on the HathiTrust digital library database, a collection of just under 14 million digitized volumes, equating to 4.9 billion pages, 60% of which is under some copyright restriction. At the same time, center staff will develop and refine tools to aid in digital humanities and text mining research over large databases and will operate the secure, large-scale computation environment required by this...

Andrew W. Mellon Foundation

This project builds upon, extends, and integrates two developmental research threads within the HathiTrust Research Center (HTRC). The first thread originates from work that was conducted in the Workset Collections for Scholarly Analysis (WCSA): Prototyping Project. The second thread continues the work of the Data Capsules (DC) project, previously supported by the Alfred P. Sloan Foundation (2011-2014). The primary objective of the WCSA+DC project is the seamless integration of the workset model and tools with the Data Capsule framework to provide non-consumptive research access to HathiTrust's massive corpus of data objects, securely and at...


An ever-increasing fraction of research is dependent on software, much of it developed in academia. But the developers are often not recognized or rewarded for their contributions in the academic systems. In addition to recognition, resources are needed to sustain research software: to continue to make it available in the future, on new platforms, meeting new needs. This project examines both aspects: measuring and assigning credit for software, often via citation, and models for sustaining software projects. In addition, the concept of ...


How can we use user-generated content to construct, infer or refine network data? We have been tackling this problem by leveraging communication content produced and disseminated in social networks to enhance graph data. For example, we have used domain-adjusted sentiment analysis to label graphs with valence values in order to enable triadic balance assessment. The resulting method enables fast and systematic sign detection, eliminates the need for surveys or manual link labeling, and reduces issues with leveraging user-generated (meta)-data. 

Intelligent Medical Objects

How accurate and suitable are current solutions? How can they be improved? We evaluate the coverage and accuracy of various medical terminologies, and test strategies for increasing the precision of mapping medical reports to standardized terminologies. 


Oct. 3, 2017

Professor and Center for Informatics Research in Science and Scholarship (CIRSS) Director Bertram Ludäscher and collaborators are presenting their joint work and tools for data quality, cleaning, and provenance at the 33rd Annual Biodiversity Information Standards conference, TDWG 2017, from October 1-6 in Ottawa, Canada. The annual conference provides a forum for developing standards and demonstrating new technologies and tools for biodiversity informatics. This year's theme is "Data Integration in a Big Data Universe: Associating Occurrences with Genes, Phenotypes, and Environments."

Three of the abstracts presented at TDWG 2017 are outcomes of the Kurator project, a collaboration between Illinois and the Museum of Comparative Zoology (MCZ) at Harvard University. Kurator is a suite of biodiversity...

Aug. 25, 2017

Thanks to a new online resource for paleoenvironmental data and models under development at Illinois and partner institutions, historian Richard Flint can gauge whether environmental factors played an important role in driving the migration of Pueblo Indians from the Spanish province of New Mexico in the seventeenth century. Using SKOPE (Synthesizing Knowledge of Past Environments), scholars such as Flint and the larger community of archaeologists will be able to discover, explore, visualize, and synthesize knowledge of environments in the recent or remote past.

"We are aiming to support different types of users—from researchers asking fundamental questions in the historical social sciences using climate retrodictions from tree-ring...

Aug. 8, 2017

A project led by iSchool Professor Les Gasser, "Simulating Social Systems at Scale (SSS)," has laid the groundwork for a prestigious award to a student researcher. Santiago Núñez-Corrales, an Informatics PhD student directed by Gasser, was recently chosen from among several hundred applicants to receive an ACM SIGHPC/Intel Computational & Data Science Fellowship, worth $15,000 per year for at least three years. 

Gasser's SSS project, which earned a 2016-2017 Faculty Fellowship from the National Center for Supercomputing Applications (NCSA), demonstrates new approaches to building very large computer models of social phenomena such as social change, the emergence of organizations, and the evolution of language and information. The project also explores new ways of connecting "live" social data to running simulations and new ways of visualizing social processes.

Núñez-Corrales is working on multidisciplinary problems in the project with three elements: (1)...

Nov. 9, 2016

Assistant Professor Jana Diesner will moderate a panel on Natural Language Processing (NLP) at the Big Data Summit on November 10 in Champaign. The annual summit brings together experts from the University of Illinois Research Park, industry, and academia to share knowledge about big data and its business applications through panel discussions, keynote presentations, and networking opportunities.

Speakers for the NLP panel will include Peter Ghavami from CEB, a global best practice insights and technology company; Allen Murvine from IMO, which provides services for the capture and management of clinical information; TJ Tang from AbbVie, a pharmaceutical research and development company; and Dan Shalmon from the Cline Center for Democracy. The panel will address the following:

In the field of NLP, people develop and test theories, algorithms, methods and technologies for making sense of...

Oct. 7, 2016

Assistant Professor Vetle Torvik has been named the iSchool's Centennial Scholar for 2016-2017. The Centennial Scholar award is endowed by alumni and friends of the School and given in recognition of outstanding accomplishments and/or professional promise in the field of library and information science.

Torvik expressed surprise and gratitude at receiving this honor. "I am in awe of colleagues who received it before me; their caliber is off the charts," he said. "I hope to use the award to open new doors—a stamp of approval from colleagues who know you well goes a long way to establish new collaborations necessary to solve the increasingly complex problems facing science and society today.”

Torvik joined the faculty in 2011. His current research addresses problems related to scientific discovery and collaboration using complex models and large-scale bibliographic databases. He is the author of articles in journals such as Proceedings of the National Academy of...

Jun. 24, 2016

Developed in the 1940s and 1950s, nuclear magnetic resonance (NMR) spectroscopy measures physical and chemical properties of atoms or molecules by measuring change in the magnetic resonance of the nuclei of atoms. The process is used by scientists for a variety of applications, such as substance identification. In biomolecular science, NMR supports discovery and identification of new drugs, disease and metabolic research, study of structural biology, and more.

Advances in computational applications and data-sharing tools have opened new doors for use of information gleaned from NMR spectroscopy, but new challenges have emerged as well. To make possible its varied applications, myriad software tools are employed from a range of sources and using a variety of semantic approaches. This complicates data management, inhibiting dissemination and reproduction of important findings.

A research team based at the iSchool at Illinois, the University of Wisconsin (UW), and the...

Jun. 10, 2016

The National Science Foundation (NSF) has awarded $5M to the "Whole Tale" project, led by Professor Bertram Ludäscher (PI) along with CIRSS affiliate Matthew Turk (co-PI, NCSA) and Associate Professor Victoria Stodden (co-PI). The five-year NSF Data Infrastructure Building Blocks (DIBBs) project will create methods and tools for scientists to link executable code, data, and other information directly to online scholarly publications, with the aim of helping to ensure reproducibility and pave the way for new discoveries. Project partners include co-PIs at the University of Chicago, the Texas Advanced Computing Center, the University of California, Santa Barbara, and the University of Notre Dame.

Further details follow below from the official press release.