Collaborative Research: ABI Development: Kurator: A Provenance-enabled Workflow Platform and Toolkit to Curate Biodiversity Data
Total Funding to Date
- Bertram Ludäscher
Data curation is a critical step in scientific data digitization, sharing, integration and use. The considerable resources allocated to digitization of natural science collections in the U.S. and globally require a focus on both digitization efficiencies and the utility of the generated data. One way to address both issues is to employ workflow software to automate and streamline data curation processes. We are developing Kurator, a suite of biodiversity data quality tools aimed at collection management specialists with little or no programming experience, database administrators and researchers with some scripting language experience, and developers. One of the tools is Kurator-Akka, which can be used as either a command line or a web-based data quality application. Kurator-Akka is designed to be accessible to data curators through a web interface, to more advanced users through editable configuration files, and to programmers for extending functionality or developing new modules/actors. Behind the scenes, and typically invisible to users of the web interface, Kurator-Akka runs workflows defined in YAML. Workflows can invoke actors written in Java or Python, with the Kurator-Akka framework managing the dataflow between actors. One of our goals is to allow users to develop data quality workflows in a drag-and-drop user interface, which behind the scenes builds YAML configuration files that can be executed through the web interface or downloaded and edited for local execution by users with some scripting language programming experience. Another goal is to enable others to write new actors (e.g., in Python) that interoperate easily with the actors we provide; we further plan to provide means for sharing these actors and example curation workflows with the community.
The Kurator software is open source and available here: https://github.com/kurator-org.
- James Hanken (PI, Harvard)
- James Macklin (Co-P, Agriculture and Agri-Food Canada)
- National Science Foundation, 2014 – $748,931.00