Doctoral student Craig Willis has received funding from the National Institutes of Health (NIH) to work with the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE)/DataMED team on a pilot project this summer. The award is based on his participation in the 2016 bioCADDIE Dataset Retrieval Challenge, which had the objective of creating innovative ways for biomedical researchers to search and discover biomedical research data.
"bioCADDIE is an NIH Big Data to Knowledge (BD2K) project to develop the DataMed [2] system, sometimes described as the 'PubMed of data,'" said Willis. "At the end of the challenge, they awarded two subcontracts. The goal of my project is to prototype and evaluate expansion models for integration into the DataMed system."
According to Willis, searching for biomedical information often requires the use of specialized language or vocabularies that may not always be reflected in users' queries. Feedback-based query expansion models alleviate this problem and improve search engine performance by automatically expanding the user's query to include additional terms related to their information need. His project will demonstrate how expanding the query based on information from external collections such as PubMed can improve overall retrieval effectiveness.
Willis is working with Associate Professor Miles Efron in the area of information retrieval and with the National Center for Supercomputing Applications (NCSA) as a research programmer on the National Data Service project. Doctoral student Garrick Sherman, MS student Thuong Phan, and NCSA Research Programmer Mike Lambert are working with Willis on his project, "Expansion Models for Biomedical Data Search." The project's funding is a subaward from the University of California San Diego that will cover $50,000 in direct costs.