Improving Information Retrieval Systems using Part of Speech Tagging
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
The object of Information Retrieval is to retrieve all relevantdocuments for a user query and only those relevant documents. Muchresearch has focused on achieving this objective with little regard forstorage overhead or performance. In the paper we evaluate the use ofPart of Speech Tagging to improve, the index storage overhead andgeneral speed of the system with only a minimal reduction to precisionrecall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and1989 document collection provided by TREC for parts of speech. We thenexperimented to find the most relevant part of speech to index. We showthat 90 percent of precision recall is achieved with 40 percent of the documentcollections terms. We also show that this is a improvement in overheadwith only a 1 percent reduction in precision recall.