Every day billions of queries are typed into search boxes on Google, Bing, Yahoo, and other search engines. Data centers around the world swell with vast amounts of information. Twitter and Facebook see a constant stream of activity.
We may not give it much thought when our fingers sweep rapidly over the keys looking for that article we heard about or directions to the restaurant, but searching through massive amounts of data is no small feat. Nor is the ability to produce an accurate search result, one that gets closely to the core of what we are searching.
Even as good as search can be, Miles Efron wants to make search better. Efron, an assistant professor at the Graduate School of Library and Information Science at the University of Illinois, Urbana-Champaign, is working on new algorithms that build upon the current strengths of search but add a new dimension—time.
“We are at a point where soon we won’t have the luxury of ignoring the temporal aspect of data,” said Efron. “In order for search to be successful, time has to make its way into search engines.” Efron’s three-year project is supported by a $408,908 grant from the National Science Foundation.
Temporal information about data is often available but, to date, there has not been a concerted effort to build technology that incorporates time factors into search. Most people use time, however, to decide whether or not a search result is relevant to their query. Sometimes it is most important that a result be the most current information on the topic, for example, while other times users are interested in results that are bounded by a particular time frame. Efron suggests that if we track the traces of information that are created as documents, collections, and language change over time, we will be better able to predict relevance, thus vastly improving search.
“In domains like search over social media, time gives us an extra piece of information when we try to predict which documents are relevant to a particular person. For instance, one of the most common problems in search is term weighting—identifying which words in documents and queries are the most indicative of their overall subject matter. An early result from this line of research showed that analyzing how a word's usage changes over time gives us a new way to model its more directly semantic properties,” said Efron.
In conjunction with his research, Efron is developing open-source software that can be used to improve information retrieval courses, especially those taught in iSchools. The software will include a series of labs that will illustrate how search engines work and will be informed in large part by the structure of Efron’s current courses in information retrieval at GSLIS.
The grant runs through September, 2015.