University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/832

Title: Bilingual Lexicon Construction Using Large Corpora
Authors: Shen, Wade
Dorr, Bonnie J.
Type: Technical Report
Issue Date: 15-Oct-1998
Series/Report no.: UM Computer Science Department; CS-TR-3666
UMIACS; UMIACS-TR-97-50
Abstract: This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of filtered linguistic feedback between term and sentence alignment processes. An implementation of this algorithm, TAG-ALIGN, is evaluated against approaches similar to [Brown et al. 1993] that apply Bayesian techniques for term alignment, and [Gale and Church 1991] a dynamic programming method for aligning sentences. The ultimate goal is to produce large bilingual lexicons with a high degree of accuracy from potentially noisy corpora. (Also cross-referenced as UMIACS-TR-97-50)
URI: http://hdl.handle.net/1903/832
Appears in Collections:Technical Reports of the Computer Science Department
Technical Reports from UMIACS

Files in This Item:

File Description SizeFormatNo. of Downloads
CS-TR-3666.pdfAuto-generated copy of CS-TR-3666.ps185.96 kBAdobe PDF314View/Open
CS-TR-3666.ps202.83 kBPostscript203View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments. -
All Contents