Browsing by Author "Oard, Douglas W."
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item CLIR Experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval(2003-04-04) Darwish, Kareem; Oard, Douglas W.The focus of the experiments reported in this paper was techniques for combining evidence for cross-language retrieval, searching Arabic documents using English queries. Evidence from multiple sources of translation knowledge was combined to estimate translation probabilities, and four techniques for estimating query-language term weights from document-language evidence were tried. A new technique that exploits translation probability information was found to outperform a comparable technique in which that information was not used. Comparative results for three variants of Arabic ^\light^] stemming are also presented. A simple variant of an existing stemming algorithm was found to result in significantly better retrieval effectiveness. UMIACS-TR-2003-26 LAMP-TR-101Item A Comparative Study of Knowledge-Based Approaches for Cross-Language Information Retrieval(1998-10-15) Oard, Douglas W.; Dorr, Bonnie J.; Hackett, Paul G.; Katsova, MariaCross-language retrieval systems seek to use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, bilingual dictionaries and translation lexicons, for cross-language retrieval. We have implemented six query translation techniques that use bilingual dictionaries, one based on lexical-semantic analysis, and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better effectiveness than simple dictionary-based techniques, and that performing document translation rather than query translation may result in further improvements in retrieval effectiveness under some conditions. (Also cross-referenced as UMIACS-TR-98-27)Item Comparing User-assisted and Automatic Query Translation(2003-02-27) He, Daqing; Wang, Jianqiang; Oard, Douglas W.; Nossal, MichaelFor the 2002 Cross-Language Evaluation Forum Interactive Track, the University of Maryland team focused on query formulation and reformulation. Twelve people performed a total of forty eight searches in the German document collection using English queries. Half of the searches were with user-assisted query translation, and half with fully automatic query translation. For the user-assisted query translation condition, participants were provided two types of cues about the meaning of each translation: a list of other terms with the same translation (potential synonyms), and a sentence in which the word was used in a translation-appropriate context. Four searchers performed the official iCLEF task, the other eight searched a smaller collection. Searchers performing the official task were able to make more accurate relevance judgments with user-assisted query translation for three of the four topics. We observed that the number of query iterations seems to vary systematically with topic, system, and collection, and we are analyzing query content and ranked retrieval measures to obtain further insight into these variations in search behavior. UMIACS-TR-2003-23 LAMP-TR-098 HCIL-TR-2003-07Item A Conceptual Framework for Text Filtering Process(1998-10-15) Oard, Douglas W.; Marchionini, GaryThis report develops a conceptual framework for text filtering practice and research, and reviews present practice in the field. Text filtering is an information seeking process in which documents are selected from a dynamic text stream to satisfy a relatively stable and specific information need. A model of the information seeking process is introduced and specialized to define information filtering. The historical development of text filtering is then reviewed and case studies of recent work are used to highlight important design characteristics of modern text filtering systems. Specific techniques drawn from information retrieval, user modeling, machine learning and other related fields are described, and the report concludes with observations on the present state of the art and implications for future research on text filtering. (Also cross-referenced as CAR-TR-830) (Also cross-referenced as EE TR-96-25) (Also cross-referenced as CLIS TR-96-02)Item The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval(2003-02-27) Demner-Fushman, Dina; Oard, Douglas W.Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that are expressed in another. This paper identifies eight types of terms that affect retrieval effectiveness in CLIR applications through their coverage by general-purpose bilingual term lists, and reports results from an experimental evaluation of the coverage of 35 bilingual term lists in news retrieval application. Retrieval effectiveness was found to be strongly influenced by term list size for lists that contain between 3,000 and 30,000 unique terms per language. Supplemental techniques for named entity translation were found to be useful with even the largest lexicons. The contribution of named entity translation was evaluated in a cross-language experiment involving English and Chinese. Smaller effects were observed from deficiencies in the coverage of domainspecific terminology when searching news stories. UMIACS-TR-2003-22 LAMP-TR-097Item NTCIR CLIR Experiments at the University of Maryland(2000-06-21) Oard, Douglas W.; Wang, JianqiangThis paper presents results for the Japanese/English cross-language informaiton retrieval task on teh NACSIS Test Collection. Two automatic dictionary-based query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required description only queries and that use of the first translation in the edict dictionary is comparable with the use of every translation. Japanese term segmentation posed no unusual problems, which contrasts sharply with results previously obtained for corss-language retrieval between Chinese and English. (Also cross-referenced as UMIACS-TR-2000-47, LAMP-TR-054)Item Probabilistic Structured Query Methods(2003-04-04) Darwish, Kareem; Oard, Douglas W.Structured methods for query term replacement rely on separate estimates of term frequency and document frequency to compute the weight for each query term. This paper reviews prior work on structured query techniques and introduces three new variants that leverage estimates of replacement probabilities. Statistically significant improvements in retrieval effectiveness are demonstrated for cross-language retrieval and for retrieval based on optical character recognition when replacement probabilities are used to estimate both term frequency and document frequency. UMIACS-TR-2003-27 LAMP-TR-102Item Speech-Based Information Retrieval for Digital Libraries(1998-11-26) Oard, Douglas W.Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries. Presently, access to most of this material is provided using a combination of manually annotated metadata and linear search. Recent advances in speech processing technology have produced a number of techniques for extracting features from recorded speech that could provide a useful basis for the retrieval of speech or multimedia objects in large digital library collections. Among these features are the semantic content of the speech, the identity of the speaker, and the language in which the speech was spoken. We propose to develop a graphical and auditory user interface for speech-based information retrieval that exploits these features to facilitate selection of recorded speech and multimedia information objects that include recorded speech. We plan to use that interface to evaluate the effectiveness and usability of alternative ways of exploiting those features and as a testbed for the evaluation of advanced retrieval techniques such as cross-language speech retrieval. (Also cross-referenced as UMIACS-TR-97-36)Item Structured Translation for Cross-Language Information Retrieval(2000-06-21) Sperer, Ruth; Oard, Douglas W.The paper introduces a query translation model that reflects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system first determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval effectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications. (Also cross-referenced as UMIACS-TR-2000-45, LAMP-TR-052)Item A Survey of Information Retrieval and Filtering Methods(1998-10-15) Faloutsos, Christos; Oard, Douglas W.We survey the major techniques for information retrieval. In the first part, we provide an overview of the traditional ones (full text scanning, inversion, signature files and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic indexing and neural networks).Item A Survey of Multilingual Text Retrieval(1998-10-15) Oard, Douglas W.; Dorr, Bonnie J.This report reviews the present state of the art in selection of texts in one language based on queries in another, a problem we refer to as ``multilingual'' text retrieval. Present applications of multilingual text retrieval systems are limited by the cost and complexity of developing and using the multilingual thesauri on which they are based and by the level of user training that is required to achieve satisfactory search effectiveness. A general model for multilingual text retrieval is used to review the development of the field and to describe modern production and experimental systems. The report concludes with some observations on the present state of the art and an extensive bibliography of the technical literature on multilingual text retrieval. The research reported herein was supported, in part, by Army Research Office contract DAAL03-91-C-0034 through Battelle Corporation, NSF NYI IRI-9357731, Alfred P. Sloan Research Fellow Award BR3336, and a General Research Board Semester Award. (Also cross-referenced as UMIACS-TR-96-19)Item TDT-2002 Topic Tracking at Maryland: First Experiments with the Lemur Toolkit(2003-02-27) He, Daqing; Park, Hyuk Ro; Murray, G. Craig; Subotin, Michael; Oard, Douglas W.The University of Maryland submitted six topic tracking runs for the 2002 Topic Detection and Tracking evaluation. Two runs were produced using the Lemur language modeling toolkit, the remaining four were produced using an separate system coded in Perl. The Lemur runs outperformed the Perl runs on the required condition because term frequency information was better handled. Two of the Perl runs used native Arabic orthography with two-best translation based on a statistical lexicon, obtaining similar results to those obtained with the Arabic-to-English translations provided with the collection. UMIACS-TR-2003-24 LAMP-TR-099Item TREC-8 Experiments at Maryland: CLIR, QA and Routing(2000-06-21) Oard, Douglas W.; Wang, Jianqiang; Lin, Dekang; Soboroff, IanThe University of Maryland team participated in four aspects of TREC-8: the ad hoc retrieval task, the main task in the cross-language retrieval (CLIR) track, the question answering track, and the routing task in the filtering track. The CLIR method was based on Pirkola's method for Dictionary-based Query Translation, using freely available dictionaries. Broad-coverage parsing and rule-based matching was used for question answering. Routing was performed using Latent Semantic Indexing in profile space.