Bilingual Lexicon Construction Using Large Corpora

dc.contributor.authorShen, Wadeen_US
dc.contributor.authorDorr, Bonnie J.en_US
dc.date.accessioned2004-05-31T22:40:17Z
dc.date.available2004-05-31T22:40:17Z
dc.date.created1997-10en_US
dc.date.issued1998-10-15en_US
dc.description.abstractThis paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of filtered linguistic feedback between term and sentence alignment processes. An implementation of this algorithm, TAG-ALIGN, is evaluated against approaches similar to [Brown et al. 1993] that apply Bayesian techniques for term alignment, and [Gale and Church 1991] a dynamic programming method for aligning sentences. The ultimate goal is to produce large bilingual lexicons with a high degree of accuracy from potentially noisy corpora. (Also cross-referenced as UMIACS-TR-97-50)en_US
dc.format.extent207701 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/832
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-3666en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-97-50en_US
dc.titleBilingual Lexicon Construction Using Large Corporaen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
CS-TR-3666.ps
Size:
202.83 KB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-3666.pdf
Size:
185.96 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-3666.ps