Doctoral candidate Shubhanshu Mishra successfully defended his dissertation, "Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data," on June 24.
His committee included Associate Professor Jana Diesner, chair and director of research; Associate Professor Vetle Torvik; Karrie Karahalios, iSchool affiliate and professor of computer science; and Robert J. Brunner, professor of accountancy.
From the abstract: Information extraction aims at developing structured data from an unstructured or semi-structured data set. The thesis starts by identifying social media data and scholarly communication data as a special case of digital social trace data (DSTD). This identification allows us to utilize the graph structure of the data (e.g. user connected to a tweet, author connected to a paper, author connected to authors, etc.) for developing new information extraction tasks. The thesis focuses on information extraction from DSTD, first using only the text data from tweets and scholarly paper abstracts, and then using the full graph structure of Twitter and scholarly communications corpora. This thesis makes three major contributions. First, methods are introduced for extracting information from social media and scholarly data. Second, new categories of information extraction are introduced. Finally, this thesis has resulted in the creation of multiple open source tools and public data sets, which can be utilized by the research community.