An Effective Approach to Temporally Anchored Information Retrieval
An Effective Approach to Temporally Anchored Information Retrieval
Files
Publication or External Link
Date
2012-08-17
Authors
Wei, Zheng
JaJa, Joseph
Advisor
Citation
DRUM DOI
Abstract
We consider in this paper the information retrieval problem over a
collection of time-evolving documents such that the search has to be
carried out based on a query text and a temporal specification. A
solution to this problem is critical for a number of emerging large
scale applications involving archived collections of web contents,
social network interactions, blog traffic, and information feeds. Given
a collection of time-evolving documents, we develop an effective
strategy to create inverted files and indexing structures such that a
temporally anchored query can be processed fast using similar strategies
as in the non-temporal case. The inverted files generated have exactly
the same structure as those generated for the classical (non-temporal)
case, and the size of the additional indexing structures is shown to be
small. Well-known previous algorithms for constructing inverted files or
for computing relevance can be extended to handle the temporal case.
Moreover, we present high throughput, scalable parallel algorithms to
build the inverted files with the additional indexing structures on
multicore processors and clusters of multicore processors. We illustrate
the effectiveness of our approach through experimental tests on a number
of web archives, and include a comparison of space used by the indexing
structures and postings lists and search time between our approach and
the traditional approach that ignores the temporal information.