Doctoral candidate Kanyao Han successfully defended his dissertation, "Natural Language Processing for Supporting Impact Assessment of Funded Projects," on January 7, 2025.
His committee included Jana Diesner (chair), affiliate associate professor in the iSchool and professor at Technical University of Munich; Associate Professor Jodi Schneider; Associate Professor Halil Kilicoglu; and Daniel C. Miller, associate professor of environmental policy in the Keough School of Global Affairs at the University of Notre Dame.
Abstract: Funding from organizations plays a crucial role in supporting researchers and practitioners in advancing scientific knowledge, promoting societal progress, and protecting the environment. This raises two critical questions: (1) How do organizations allocate their funding across various projects and fields? (2) Do these funded projects lead to significant outcomes and impacts? Addressing these questions requires a comprehensive analysis of text-based data documenting funding, outcomes, and impacts, including project reports submitted to funders and published outcomes in research articles. However, annotating and analyzing text-based data can be both costly and time-consuming. Researchers must navigate lengthy and large-scale datasets to identify meaningful information for analysis. This dissertation aims to leverage Natural Language Processing (NLP) and Machine Learning (ML) to assist researchers and administrative staff in managing text-based data more efficiently. By automating or semi-automating processes such as information extraction, data cleaning, and classification, this work seeks to reduce the workload associated with data processing and annotation. This dissertation explores how NLP and ML techniques can be developed and used to handle data under three challenging conditions: (1) disorganized, complex, lengthy, or incomplete datasets; (2) limited availability of annotated data; and (3) the need for domain-specific analysis schemas. By addressing these challenges, this dissertation aims to propose innovative approaches to aid in the analysis of funding allocation and the assessment of the impact of funded projects. This dissertation contributes to (1) developing novel frameworks for cleaning, annotating, and extracting valuable information from publication records and project reports; (2) providing insights into funding allocation in scientific research and biodiversity conservation; and (3) enhancing the understanding of the impacts generated by funded projects.