A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in
Information Retrieval
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
The vast amount of textual information available today is useless unless
it can be effectively and efficiently searched. In information
retrieval, we wish to match queries with relevant documents. Documents
can be represented by the terms that appear within them, but literal
matching of terms does not necessarily retrieve all relevant documents.
Latent Semantic Indexing represents documents by approximations and tends
to cluster documents on similar topics even if their term profiles are
somewhat different. This approximate representation is usually
accomplished using a low-rank singular value decomposition (SVD)
approximation. In this paper, we use an alternate decomposition, the
semi-discrete decomposition (SDD). In our tests, for equal query times,
the SDD does as well as the SVD and uses less than one-tenth the storage.
Additionally, we show how to update the SDD for a dynamically changing
document collection.
(Also cross-referenced as UMIACS-TR-96-92)