Breaking the Resource Bottleneck for Multilingual Parsing
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
We propose a framework that enables the acquisition of annotation-heavy
resources such as syntactic dependency tree
corpora for low-resource languages by importing linguistic annotations from
high-quality English resources. We present a
large-scale experiment showing that Chinese dependency trees can be induced
by using an English parser, a word alignment
package, and a large corpus of sentence-aligned bilingual text. As a part of
the experiment, we evaluate the quality of a
Chinese parser trained on the induced dependency treebank. We find that a
parser trained in this manner out-performs some
simple baselines inspite of the noise in the induced treebank. The results
suggest that projecting syntactic structures from
English is a viable option for acquiring annotated syntactic structures
quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and
error-correction techniques are applied.
(Also LAMP-TR-086)
(Also UMIACS-TR-2002-35)