Large Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition
Large Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition
Files
Publication or External Link
Date
1998-10-15
Authors
Kolda, Tamara G.
O'Leary, Dianne P.
Advisor
Citation
DRUM DOI
Abstract
With the electronic storage of documents comes the possibility of
building search engines that can automatically choose documents relevant
to a given set of topics. In information retrieval, we wish to match
queries with relevant documents. Documents can be represented by the
terms that appear within them, but literal matching of terms does not
necessarily retrieve all relevant documents. There are a number of
information retrieval systems based on inexact matches. Latent Semantic
Indexing represents documents by approximations and tends to cluster
documents on similar topics even if their term profiles are somewhat
different. This approximate representation is usually accomplished using
a low-rank singular value decomposition (SVD) approximation. In this
paper, we use an alternate decomposition, the semi-discrete decomposition
(SDD). For equal query times, the SDD does as well as the SVD and uses
less than one-tenth the storage for the MEDLINE test set.
(Also cross-referenced as UMIACS-TR-96-83)