Lan Jiang's Preliminary Exam

PhD candidate Lan Jiang will present her dissertation proposal, "Effective and Fair Writing Assessment with Large Language Models." Jiang's preliminary examination committee includes Assistant Professor Nigel Bosch, Associate Professor Halil Kilicoglu, Associate Professor Dong Wang, and Associate Professor Vetle Torvik.

Abstract

Text assessments are beneficial for various purposes: measuring students' achievements, evaluating the quality of reporting, or even detecting suicide risk. Traditionally, these assessments are performed by experts who manually review the text. However, human examination is time-consuming, demands expertise, and still suffers from agreement concerns and external factors, including inconsistency due to fatigue. With the advancement of artificial intelligence technologies, such as machine learning and natural language processing, researchers have attempted to build systems that can automate the text assessment process. The text assessment systems are beneficial for both writers (i.e., students and authors) and assessors (i.e., educators and editors). For writers, immediate assessments can help them adjust their writing accordingly or verify their understanding of core ideas. For assessors, text assessment systems can make their jobs easier by allowing them to assess a large number of writings with speed, consistency, and accuracy. In the education domain, most existing systems are mainly aimed at correcting grammar errors and assessing cohesion within the context rather than validating arguments and assessing the correctness of the content. In the biomedical domain, there are few tools available for text assessments because annotated training data is scarce due to the expert-demanding and time-consuming nature of the task. Additionally, privacy and fairness concerns limit the use of text assessment systems in high-stakes evaluations. One key issue is that text can reveal personal identities, potentially enabling individual reidentification. Thus, limited datasets are publicly available in some domains, such as the education and health domains. Moreover, researchers have uncovered significant gender bias in essay assessment. Their findings suggest that the academic writing style favored in assessments tends to be assertive, self-confident, and bold, which is often associated with males. In contrast, females may be perceived as too cautious or unassertive in their writing style. Machine learning models can inherit biases from the training data. In this proposal, my aim is to investigate the effectiveness of large language models (LLMs) in text assessments, as well as to develop a method to reduce gender indicators from text and further explore how gender-neutral text affects text assessment performance. In particular, I investigate the usage of LLMs in educational and biomedical domains. Overall, the purpose of this proposal is to build text assessments model with considering fairness through LLMs.

Question? Contact Lan Jiang.