Yuanxi Fu's Preliminary Exam

PhD student Yuanxi Fu will present her dissertation proposal, "Unreliability Propagation in Science: Conceptual Foundations and Mitigation Measures." Her committee includes Associate Professor Jodi Schneider, Assistant Professor Nigel Bosch, Associate Professor Peter Darch, and Professor Bertram Ludäscher.
Abstract
Contemporary scientific research is a collaborative enterprise. Scientists build upon each other’s work. They also rely heavily on resources developed by others, from cell lines to datasets and computer programs. The interdependencies of research output are inevitable because of the specialization of scientists and are necessary for overall efficiency. Scientists delegate a (crucial)part of intellectual work to other fellow scientists, which comes with a price if such trust is misplaced. That price—diminished research quality—is paid not by the individual scientist but by the system of science, which makes the problem particularly alarming. In this dissertation, I will examine unreliability propagation (UP): the transmutation of one unreliable scientific object to another unreliable scientific object.
I will address three research questions. First, how can we theorize UP as an information phenomenon? Second, how can we triage citing publications based on their risk of propagating unreliability at scale (with an unreliable computational chemistry protocol as the source)? Third, why do researchers still choose to use a less reliable community algorithm when its improved version is available? To address RQ1, I will conceptualize UP using two approaches: (1) a reasoning-centric approach that models UP with argument graphs and (2) an object-centric approach that uses conceptual analysis to identify individually necessary and collective sufficient conditions for a scientist (S) to use an unreliable scientific object A to create another unreliable scientific objectB. Results from both approaches will be combined to disentangle the information-reasoning-thing entwinement in UP and provide a theorization of it.
To address RQ2, I will design and assess algorithmic approaches to triage publications based on their risk of further propagating unreliability. The resulting risk levels can be used to mitigate UP in digital libraries: Digital libraries can flag publications based on the risk levels; iii authors of high-risk publications will be notified to take action (e.g., check whether their results are affected and make necessary amendments). As a use case, I focus on publications citing agiven computational chemistry protocol, one of whose Python scripts contains a code glitch. The protocol was chosen because of its influence and the relative simplicity of the unreliability: It is limited to one script and requires no chemistry domain knowledge to understand.
To address RQ3, I will analyze a corpus of publications citing both the unreliable Louvain algorithm and its improvement, the more reliable Leiden algorithm. I will extract the reasons authors gave for choosing Louvain, Leiden, or both. I will then associate these reasons(and their absence) to answer RQ3. This dissertation draws theories and methods from three fields: information science, argumentation studies, and science of science and seeks to inform the theory of information and the practice of scientific information management.