Torvik develops dataset, tools for study of innovation and aging

Vetle Torvik
Vetle Torvik, Associate Professor

As society ages and human knowledge progresses, we expect innovations from scientists that will improve quality of life for older adults and help society adapt to the realities of a changing population. Yet the scientific workforce itself is aging, potentially affecting its own capacity for innovation. The relationship between innovation in the scientific workforce and the increasing demand for innovation is complex, with far-reaching implications influencing everything from healthcare to the economy.

With the aim of shedding light on the specific impacts of this relationship, the National Bureau of Economic Research has embarked on a project, Innovation in an Aging Society (IAS). IAS involves experts in economics, information science, and neuroscience, including GSLIS Assistant Professor Vetle Torvik. The IAS team is taking a close look at the relationship between aging and innovation, in terms of how the capacity for innovation changes throughout the human lifecycle, how society will be affected by the aging of the biomedical research workforce, and how the aging of society may increase demand on the biomedical research industry. The project began in 2013 and has received $4.5 million in funding from the National Institutes of Health.

Torvik is a subcontract principal investigator focusing on one piece of this puzzle. He and his team—including Neil Smalheiser, associate professor in the Department of Psychiatry at the University of Illinois at Chicago, GSLIS doctoral student Shubhanshu Mishra, and informatics doctoral candidate Brent Fegley—are tackling the technical hurdles that will allow discovery of patterns hidden in myriad metadata regarding the biomedical scientific workforce.

“Vetle’s contribution is to provide one of the most critical building blocks for this work,” said IAS Principal Investigator Bruce Weinberg, professor of economics at Ohio State University. “Innovations are produced by people, and so it is essential to know who is who, where they are located, and information about them like gender, race, and ethnicity. His work has also generated important understandings of networks and the novelty of research.”

Torvik’s team is developing a large-scale, disambiguated set of data drawn from databases of publications, patents, and grants and building tools for navigating the data. Using these tools, other IAS researchers will be able to analyze the information, revealing trends that may provide insight into innovation across scientists’ careers, such as the points in their personal and professional lives at which scientists are most prolific.

“The main problem we’re trying to solve is ambiguity in author names . . . and we’re trying to do this on a large scale,” explained Torvik. “The other [problem] is doing this across bibliographic databases so that we can link people and get a better picture of some of the many different things that scientists do. It is not surprising that you will get the wrong picture if you assume the name J. A. Smith corresponds to one person. What is surprising is that scholars often make that assumption. We recently had a paper in PLOS ONE that characterized the degree to which coauthorship networks are distorted if one fails to disambiguate.”

By improving disambiguation of author names and connecting individual scientists to publishing metadata, institutional affiliation data, and demographics—and scaling the process to encompass thousands of scientists— the IAS team can begin to identify correlations between the human lifecycle and innovation in biomedical research.

“We can get eighty percent accuracy pretty easily, but we’re aiming at ninety-eight percent. We use sophisticated statistical models and harvest a lot of supplemental information to accomplish this. And then we apply these algorithms across large-scale bibliographic datasets. The resulting publications, data, and online tools are where we see a lot of impact,” he said.

Subsequent to disambiguation, a suite of tools is used to elucidate characteristics of scientists and their output. Torvik’s geocoding tool, MapAffil, infers geographic location and type of institution worldwide, and links to demographic data from the US census; Genni predicts author gender and works well even for names rarely seen in the US; Ethnea imputes ethnicity; and Patci is a probabilistic citation matcher that includes links from patents to papers. The disambiguated linked data and tools are all available online at http://abel.lis.illinois.edu.

“We’ll have a person-centered, longitudinal dataset that uses papers, patents, grants, and dissertations to map out individuals’ careers by what they work on, where they are located, who they work with, and when. And we are studying a variety of complex social phenomena including collaboration, diversity, mobility, scientific reliability, impact, productivity, and how these have changed over the years and were influenced by ‘external shocks’ like changes in government policy on science funding,” said Torvik.

Torvik’s foundational work will facilitate the larger project’s efforts to predict effects of the aging of the scientific workforce and make policy-related recommendations. The data and tools developed by Torvik’s team will also be made freely available to the wider research community and maintained in perpetuity, enabling further research at the intersection of aging and innovation.

Torvik is an assistant professor at GSLIS. His areas of expertise include mathematical optimization, computational statistics, text and data mining, literature-based discovery, and bioinformatics. He teaches courses on those topics, as well as informetrics, information processing, and literature-based discovery. Torvik earned a BA in mathematics from St. Olaf College, an MS in operations research from Oregon State University, and a PhD in engineering science from Louisiana State University.

This article initially appeared in the Fall 2015 issue of Intersections magazine.

Tags:
Updated on
Backto the news archive

Related News

Trainor receives the Karen Wold Level the Learning Field Award

Senior Lecturer Kevin Trainor has been selected by the Division of Disability Resources and Educational Services (DRES) to receive the 2024 Karen Wold Level the Learning Field Award. This award honors exemplary members of faculty and staff for advocating and/or implementing instructional strategies, technologies, and disability-related accommodations that afford students with disabilities equal access to academic resources and curricula. 

Kevin Trainor

Seo coauthors chapter on data science and accessibility

Assistant Professor JooYoung Seo and Mine Dogucu, professor of statistics in the Donald Bren School of Information and Computer Sciences at the University of California Irvine, have coauthored a chapter in the new book Teaching Accessible Computing. The goal of the book, which is edited by Alannah Oleson, Amy J. Ko and Richard Ladner, is to help educators feel confident in introducing topics related to disability and accessible computing and integrating accessibility into their courses.

JooYoung Seo

iSchool instructors ranked as excellent

Fifty-five iSchool instructors were named in the University's List of Teachers Ranked as Excellent for Fall 2023. The rankings are released every semester, and results are based on the Instructor and Course Evaluation System (ICES) questionnaire forms maintained by Measurement and Evaluation in the Center for Innovation in Teaching and Learning. 

iSchool Building

ConnectED: Tech for All podcast launched by Community Data Clinic

The Community Data Clinic (CDC), a mixed methods data studies and interdisciplinary community research lab led by Associate Professor Anita Say Chan, has released the first episode of its new podcast, ConnectED: Tech for All. Community partners on the podcast include the Housing Authority of Champaign County, Champaign-Urbana Public Health District, Project Success of Vermilion County, and Cunningham Township Supervisor’s Office.

Community Data Clinic podcast logo

New study shows LLMs respond differently based on user’s motivation

A new study conducted by PhD student Michelle Bak and Assistant Professor Jessie Chin, which was recently published in the Journal of the American Medical Informatics Association (JAMIA), reveals how large language models (LLMs) respond to different motivational states. In their evaluation of three LLM-based generative conversational agents (GAs)—ChatGPT, Google Bard, and Llama 2—the researchers found that while GAs are able to identify users' motivation states and provide relevant information when individuals have established goals, they are less likely to provide guidance when the users are hesitant or ambivalent about changing their behavior.