Browsing by Author "Ayan, Necip Fazil"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Combining Linguistic and Machine Learning Techniques for Word Alignment Improvement(2005-11-23) Ayan, Necip Fazil; Dorr, Bonnie J; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success of many NLP applications such as statistical machine translation (MT), construction of bilingual lexicons, word-sense disambiguation, and projection of resources between languages. With the availability of large parallel texts, statistical word alignment systems have proven to be quite successful on many language pairs. However, these systems are still faced with several challenges due to the complexity of the word alignment problem, lack of enough training data, difficulty learning statistics correctly, translation divergences, and lack of a means for incremental incorporation of linguistic knowledge. This thesis presents two new frameworks to improve existing word alignments using supervised learning techniques. In the first framework, two rule-based approaches are introduced. The first approach, Divergence Unraveling for Statistical MT (DUSTer), specifically targets translation divergences and corrects the alignment links related to them using a set of manually-crafted, linguistically-motivated rules. In the second approach, Alignment Link Projection (ALP), the rules are generated automatically by adapting transformation-based error-driven learning to the word alignment problem. By conditioning the rules on initial alignment and linguistic properties of the words, ALP manages to categorize the errors of the initial system and correct them. The second framework, Multi-Align, is an alignment combination framework based on classifier ensembles. The thesis presents a neural-network based implementation of Multi-Align, called NeurAlign. By treating individual alignments as classifiers, NeurAlign builds an additional model to learn how to combine the input alignments effectively. The evaluations show that the proposed techniques yield significant improvements (up to 40% relative error reduction) over existing word alignment systems on four different language pairs, even with limited manually annotated data. Moreover, all three systems allow an easy integration of linguistic knowledge into statistical models without the need for large modifications to existing systems. Finally, the improvements are analyzed using various measures, including the impact of improved word alignments in an external application---phrase-based MT.Item Domain Tuning of Bilingual Lexicons for MT(2003-02-27) Ayan, Necip Fazil; Dorr, Bonnie; Kolak, OkanOur overall objective is to translate a domain-specific document in a foreign language (in this case, Chinese) to English. Using automatically induced domain-specific, comparable documents and language-independent clustering, we apply domain-tuning techniques to a bilingual lexicon for downstream translation of the input document to English. We will describe our domain-tuning technique and demonstrate its effectiveness by comparing our results to manually constructed domain-specific vocabulary. Our coverage/accuracy experiments indicate that domain-tuned lexicons achieve 88% precision and 66% recall. We also ran a Bleu experiment to compare our domain-tuned version to its un-tuned counterpart in an IBM-style MT system. Our domain-tuned lexicons brought about an improvement in the Bleu scores: 9.4% higher than a system trained on a uniformly-weighted dictionary and 275% higher than a system trained on no dictionary at all. UMIACS-TR-2003-19 LAMP-TR-096