|
DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1903/832
|
| Title: | Bilingual Lexicon Construction Using Large Corpora |
| Authors: | Shen, Wade Dorr, Bonnie J. |
| Type: | Technical Report |
| Issue Date: | 15-Oct-1998 |
| Series/Report no.: | UM Computer Science Department; CS-TR-3666 UMIACS; UMIACS-TR-97-50 |
| Abstract: | This paper introduces a method for learning bilingual term and sentence
level alignments for the purpose of building lexicons. Combining
statistical techniques with linguistic knowledge, a general algorithm
is developed for learning term and sentence alignments from large
bilingual corpora with high accuracy. This is achieved through the
use of filtered linguistic feedback between term and sentence alignment
processes. An implementation of this algorithm, TAG-ALIGN, is evaluated
against approaches similar to [Brown et al. 1993] that apply Bayesian
techniques for term alignment, and [Gale and Church 1991] a dynamic
programming method for aligning sentences. The ultimate goal is to
produce large bilingual lexicons with a high degree of accuracy from
potentially noisy corpora.
(Also cross-referenced as UMIACS-TR-97-50) |
| URI: | http://hdl.handle.net/1903/832 |
| Appears in Collections: | Technical Reports of the Computer Science Department Technical Reports from UMIACS
|
All items in DRUM are protected by copyright, with all rights reserved.
|