Diesner and Mishra publish paper on NER tool for social media research

Twitter logo
Jana Diesner
Jana Diesner, Associate Professor and PhD Program Director
Shubhanshu Mishra
Shubhanshu Mishra

The identification of proper names of people, organizations, and locations from raw texts, referred to as Named Entity Recognition (NER), can be highly accurate when researchers use NER tools on a large collection of text with proper syntax. However, using existing NER tools for analyzing social media text can lead to poor identification of named entities. In particular, Twitter text frequently includes inconsistent capitalization, spelling errors, and shortened versions of words.

TwitterNER, an open-source tool developed by doctoral student Shubhanshu Mishra, who is supervised by Assistant Professor Jana Diesner, can help researchers interested in performing NER on social media text. TwitterNER has recently been shown (in an independent evaluation by Humangeo) to perform better in terms of precision than some other publicly available systems for entity types of person, location, and organization, which are often of most interest to researchers.

"Our system relies on a combination of hand-engineered features," explained Mishra. "It follows the paradigm of transductive semi-supervised learning where all the labeled and unlabeled data is utilized to make predictions about the unlabeled data."

The original implementation of TwitterNER was created for the shared-task session at the 2016 Conference on Computational Linguistics (COLING) Workshop on "Noisy User-generated Text" (W-NUT). Workshop participants were asked to build an NER system for Twitter data, which was evaluated using a common test dataset. TwitterNER had a high level of precision among the various systems.

Diesner and Mishra then improved their approach and shared it with W-NUT by submitting the paper, "Semi-supervised Named Entity Recognition in noisy-text."

"Our original submission ranked seventh in the task, but our final improved version surpassed the second-best performing system on the concluded task," said Mishra. "The winning system was based on deep learning, but its implementation is not publicly available."

Mishra has an integrated MS and BS in mathematics and computing from the Indian Institute of Technology Kharagpur. He is interested in the analysis of information generation in social networks such as those in scholarly data and social media websites. His prior projects have included systems for user sentiment profiling, active learning using human-in-the-loop design pattern, and novelty profiling in scholarly data.

Diesner is an expert in human-centered computing, network science, natural language processing, and machine learning. Recognition for her research expertise include appointments as CIO Scholar for Information Research & Technology at Illinois (2018), faculty fellow at the National Center for Supercomputing Applications (NCSA) at Illinois (2015), and as a research fellow in the Dori J. Maynard Senior Research Fellows program through The Center for Investigative Reporting and The Robert C. Maynard Institute for Journalism Education (2016). She holds a PhD from the Computation, Organizations and Society (COS) program at Carnegie Mellon University's School of Computer Science.

Updated on
Backto the news archive

Related News

Chan joins iSchool faculty

The iSchool is pleased to announce that Anita Say Chan has joined the faculty. She also holds a joint appointment with the College of Media, where she is an associate professor of communications in the Department of Media and Cinema Studies.

Anita Say Chan

Downie to give keynote at digital scholarship symposium

Professor and Associate Dean for Research J. Stephen Downie will be the keynote speaker for Digital Scholarship Symposium 2019, which will be held on March 19 at The Chinese University of Hong Kong (CUHK). The theme of this year's symposium is "(Re-)Mining Text: From Traditional to Digital." Co-organized by the Hong Kong Literature Research Centre and CUHK Library, the event aims to explore techniques and applications of text mining in the era of digital scholarship.

J. Stephen Downie, Professor and Associate Dean for Research

Jihan receives scholarship to attend PyCon 2019

MS/LIS student Itzel Jihan has been awarded a scholarship to attend the PyCon 2019 conference, which will be held May 1-9 in Cleveland, Ohio. The conference is the largest annual gathering for the community using and developing the open-source Python programming language. It includes tutorials, talks, events such as a poster session and job fair, and sprints, where developers collaborate on open source projects.

Itzel Jihan

Bonn to present research at NFAIS 2019 Humanities Roundtable

Associate Professor Maria Bonn will discuss Publishing Without Walls (PWW) at the National Federation of Advanced Information Science (NFAIS) 2019 Humanities Roundtable, which will be held on March 10 in Washington, D.C. The topic of this year's program is "Evaluation of Digital Scholarship in the Humanities and Its Impact." It will address the skills, tools, and resources required for digital humanities evaluation as well as how publishers, libraries, and content aggregators can better support digital humanities.

Maria Bonn

Ferreira appreciates scholarship, Leep program flexibility

For Kelly Ferreira, receiving a Leep Scholarship has made "a world of difference." When she enrolled in the MS/LIS program, she was employed at the Ela Area Public Library in Lake Zurich, Illinois. However, she had to leave her position when she recently moved with her partner across the country. Now working as a library technician at Eastern Florida State College's Palm Bay campus, Ferreira appreciates the flexibility of the Leep online option as well as her scholarship.

Kelly Ferreira