Breaking the Resource Bottleneck for Multilingual Parsing
dc.contributor.author | Hwa, Rebecca | en_US |
dc.contributor.author | Resnik, Philip | en_US |
dc.contributor.author | Weinberg, Amy | en_US |
dc.date.accessioned | 2004-05-31T23:17:45Z | |
dc.date.available | 2004-05-31T23:17:45Z | |
dc.date.created | 2002-04 | en_US |
dc.date.issued | 2002-05-22 | en_US |
dc.description.abstract | We propose a framework that enables the acquisition of annotation-heavy resources such as syntactic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied. (Also LAMP-TR-086) (Also UMIACS-TR-2002-35) | en_US |
dc.format.extent | 67343 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1903/1194 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | Digital Repository at the University of Maryland | en_US |
dc.relation.isAvailableAt | University of Maryland (College Park, Md.) | en_US |
dc.relation.isAvailableAt | Tech Reports in Computer Science and Engineering | en_US |
dc.relation.isAvailableAt | UMIACS Technical Reports | en_US |
dc.relation.ispartofseries | UM Computer Science Department; CS-TR-4355 | en_US |
dc.relation.ispartofseries | LAMP-TR-086 | en_US |
dc.relation.ispartofseries | UMIACS; UMIACS-TR-2002-35 | en_US |
dc.title | Breaking the Resource Bottleneck for Multilingual Parsing | en_US |
dc.type | Technical Report | en_US |
Files
Original bundle
1 - 1 of 1