|
DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1903/12879
|
| Title: | An Effective Approach to Temporally Anchored Information Retrieval |
| Authors: | Wei, Zheng JaJa, Joseph |
| Type: | Technical Report |
| Issue Date: | 17-Aug-2012 |
| Series/Report no.: | UMIACS;UMIACS-TR-2012-10 |
| Abstract: | We consider in this paper the information retrieval problem over a
collection of time-evolving documents such that the search has to be
carried out based on a query text and a temporal specification. A
solution to this problem is critical for a number of emerging large
scale applications involving archived collections of web contents,
social network interactions, blog traffic, and information feeds. Given
a collection of time-evolving documents, we develop an effective
strategy to create inverted files and indexing structures such that a
temporally anchored query can be processed fast using similar strategies
as in the non-temporal case. The inverted files generated have exactly
the same structure as those generated for the classical (non-temporal)
case, and the size of the additional indexing structures is shown to be
small. Well-known previous algorithms for constructing inverted files or
for computing relevance can be extended to handle the temporal case.
Moreover, we present high throughput, scalable parallel algorithms to
build the inverted files with the additional indexing structures on
multicore processors and clusters of multicore processors. We illustrate
the effectiveness of our approach through experimental tests on a number
of web archives, and include a comparison of space used by the indexing
structures and postings lists and search time between our approach and
the traditional approach that ignores the temporal information. |
| URI: | http://hdl.handle.net/1903/12879 |
| Appears in Collections: | Technical Reports from UMIACS
|
All items in DRUM are protected by copyright, with all rights reserved.
|