Special CIRSS Seminar: Peter Murray-Rust

Peter Murray-Rust will be give the presentation, "ContentMining and Knowledge-Driven Scholarship : The Right to Read is the Right to Mine."

Abstract: Five new scholarly articles and dissertations are published every minute, and most are crammed with valuable and exciting facts. We (in ContentMine and many other places) have developed tools to do this automatically in any field - medicine, chemistry, tropical disease, astronomy, social science, history, architecture.. Facts are supported by ontologies and we now have the largest Open curated ontology anywhere in Wikidata - over 40 million precisely defined terms and data supporting all these fields and more.

Typically a researcher (our youngest so far is 15!) legitimately downloads maybe 500 papers relating to their subject and then searches for terms in Wikidata. What spreads Zika? If you didn't already know you wold be interested that there's a lot about mosquitos in Zika papers! The literature is automatically and gently teaching you new knowledge. It's not just words - our software can extract data from tables, diagrams, chemical formulae, semantic chemistry, evolutionary tress, statistical plots, etc. and automatically turn them into computable form.

This gives huge power to the researcher and their institution - but only if they are allowed to do it. And, unfortunately there has been massive opposition from publishers, and no challenge from Universities or Libraries. In the UK we championed for the law to be changed - and it was (to a limited extent - non-commercial) but no one has defended this right and over the last 4 years it has got worse rather than better. UK libraries generally accede to publisher pressure and often restrict researchers. This means that contentMining (aka Text and data Mining , TDM) has become an underground activity.

The US has to decide rapidly whether academia or publishers control this area. The best way now is to go out and do it - we have talked too much already. There are many Open toolsets - contentmine.org, R, Stanford, and many young people who know how to use them. I will demonstrate them and give general guidance.

My hope is that the doctrine of Fair Use in the US is powerful enough to take this forward. We can show that TDM would already save lives and we should surely win in the court of public opinion. IANAL but I admire and am envious of what the Hathi Trust has achieved and I hope I can find synergy in my visit.

Biosketch, from Wikipedia: Murray-Rust is a chemist currently working at the University of Cambridge. As well as his work in chemistry, Murray-Rust is also known for his support of open access and open data. He was educated at Bootham School and Balliol College, Oxford. After obtaining a Doctor of Philosophy, he became lecturer in chemistry at the (new) University of Stirling and was first warden of Andrew Stewart Hall of Residence. In 1982, he moved to Glaxo Group Research at Greenford to head Molecular Graphics, Computational Chemistry and later protein structure determination. He was Professor of Pharmacy in the University of Nottingham from 1996–2000, setting up the Virtual School of Molecular Sciences. He is now Reader Emeritus in Molecular Informatics at the University of Cambridge and Senior Research Fellow Emeritus of Churchill College, Cambridge. 

His research interests have involved the automated analysis of data in scientific publications, creation of virtual communities, e.g. The Virtual School of Natural Sciences in the Globewide Network Academy, and the Semantic Web. With Henry Rzepa, he has extended this to chemistry through the development of markup languages, especially Chemical Markup Language.[2] He campaigns for open data, particularly in science, and is on the advisory board of the Open Knowledge International and a co-author of the Panton Principles for Open scientific data.[3] Together with a few other chemists, he was a founder member of the Blue Obelisk movement in 2005.

This event is sponsored by CIRSS