ERRT: Dmitry Mozzherin

At this meeting of the eResearch Roundtable, Dmitry Mozzherin, molecular biologist and biodiversity informatician at the Illinois Natural History Survey will give the talk, "How to index biological knowledge about species in one day?"

Abstract: For the last 250 years we use binomial nomenclature to communicate information about animals, plants and bacteria. Introduction of the binomial nomenclature helped tremendously to expand our knowledge about the life on our planet. Biodiversity Heritage Library project collected more then 50 million pages this knowledge, spanning several hundred years. To be able to work with this massive amount of data we need to find and organize scientific names mentioned on each of these pages. The task is surprisingly complicated because on average, there are 3 scientific names per 1 species, and about 50 different ways these names were written. Global Names Architecture creates tools that allow to find how all these various names and their spellings are connected, and to organize and disambiguate them. There are 3 stages in this disambiguation. First there is a lexical stage where spelling variants of a scientific names are organized into lexical groups. Second stage is nomenclatural, that allows to find evolution of a names in scientific literature, and the third, taxonomical stage, finds a currently adopted name for a taxon. Global Names Architecture also develops tools for recognizing scientific names in texts and our goal is to be able to go through all accumulated biological knowledge and index it in a matter of a day.

For fifteen years Mozzherin was a molecular biologist studying DNA replication and how various analogs of nucleotides can be used to selectively switch off DNA polymerases of viruses leaving human replication machinery intact. Later he became interested in the Open Source movement and learned programming. He worked at the Encyclopedia of Life project that collects information about all species in the world, and for the last eight years, he has been trying to figure out how to globally organize scientific information using scientific names as a glue. His other passions are wild life photography and sculpture.

This event is sponsored by Center for Informatics Research in Science and Scholarship (CIRSS)