Bilingual Lexicon Construction Using Large Corpora

Shen, Wade; Dorr, Bonnie J.

Bilingual Lexicon Construction Using Large Corpora

Files

CS-TR-3666.ps (202.83 KB)

No. of downloads: 352

CS-TR-3666.pdf (185.96 KB)

No. of downloads: 806

Date

1998-10-15

Authors

Shen, Wade

Dorr, Bonnie J.

Abstract

This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of filtered linguistic feedback between term and sentence alignment processes. An implementation of this algorithm, TAG-ALIGN, is evaluated against approaches similar to [Brown et al. 1993] that apply Bayesian techniques for term alignment, and [Gale and Church 1991] a dynamic programming method for aligning sentences. The ultimate goal is to produce large bilingual lexicons with a high degree of accuracy from potentially noisy corpora. (Also cross-referenced as UMIACS-TR-97-50)

URI (handle)

http://hdl.handle.net/1903/832

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department

Full item page