Browsing by Author "Resnik, Philip"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Item Breaking the Resource Bottleneck for Multilingual Parsing(2002-05-22) Hwa, Rebecca; Resnik, Philip; Weinberg, AmyWe propose a framework that enables the acquisition of annotation-heavy resources such as syntactic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied. (Also LAMP-TR-086) (Also UMIACS-TR-2002-35)Item Evaluating Multilingual Gisting of Web Pages(1998-10-15) Resnik, PhilipWe describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form. (Also cross-referenced as UMIACS-TR-97-39)Item Evaluating Translational Correspondence using Annotation Projection(2003-04-04) Hwa, Rebecca; Resnik, Philip; Weinberg, Amy; Kolak, OkanRecently, statistical machine translation models have begun to take advantage of higher level linguistic structures such as syntactic dependencies. Underlying these models is an assumption about the directness of translational correspondence between sentences in the two languages; however, the extent to which this assumption is valid and useful is not well understood. In this paper, we present an empirical study that quantifies the degree to which syntactic dependencies are preserved when parses are projected directly from English to Chinese. Our results show that although the direct correspondence assumption is often too restrictive, a small set of principled, elementary linguistic transformations can boost the quality of the projected Chinese parses by 76\% relative to the unimproved baseline. UMIACS-TR-2003-25 LAMP-TR-100 ,Item Extending Phrase-Based Decoding with a Dependency-Based Reordering Model(2009-12-23) Hunter, Tim; Resnik, PhilipPhrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrase-based and syntactically informed statistical MT, in the form of a model that supplements conventional, non-hierarchical phrase-based techniques with linguistically informed reordering based on syntactic dependency trees. The key idea is to exploit linguistically-informed hierarchical structures only for those dependencies that cannot be captured within a single flat phrase. For very local dependencies we leverage the success of conventional phrase-based approaches, which provide a sequence of target-language words appropriately ordered and ready-made with the appropriate agreement morphology. Working with dependency trees rather than constituency trees allows us to take advantage of the flexibility of phrase-based systems to treat non-constituent fragments as phrases. We do impose a requirement --- that the fragment be a novel sort of "dependency constituent" --- on what can be translated as a phrase, but this is much weaker than the requirement that phrases be traditional linguistic constituents, which has often proven too restrictive in MT systems.Item Gibbs Sampling for the Uninitiated(2010-04-16) Resnik, Philip; Hardisty, EricThis document is intended for computer scientists who would like to try out a Markov Chain Monte Carlo (MCMC) technique, particularly in order to do inference with Bayesian models on problems related to text processing. We try to keep theory to the absolute minimum needed, though we work through the details much more explicitly than you usually see even in "introductory" explanations. That means we've attempted to be ridiculously explicit in our exposition and notation. After providing the reasons and reasoning behind Gibbs sampling (and at least nodding our heads in the direction of theory), we work through an example application in detail|the derivation of a Gibbs sampler for a Naive Bayes model. Along with the example, we discuss some practical implementation issues, including the integrating out of continuous parameters when possible. We conclude with some pointers to literature that we've found to be somewhat more friendly to uninitiated readers.Item Lexical Resource Integration across the Syntax-Semantics Interface(2001-05-10) Green, Rebecca; Pearl, Lisa; Dorr, Bonnie J.; Resnik, PhilipThis paper examines extending a database of English verbs, grouped into syntactico-semantic classes, with WordNet senses. Probabilistic associations between theta-grids and WordNet verb frames, SEMCOR frequency data, and disambiguation based on an information-theoretic notion of semantic similarity are used. Mapping successes and failures are illustrated with 'drop'. (Cross-referenced as UMIACS-TR-2001-18) (Cross-refereneced as LAMP-TR-068)Item THE LINGUIST'S SEARCH ENGINE: GETTING STARTED GUIDE(2003-12-18) Resnik, Philip; Elkiss, AaronThe World Wide Web can be viewed as a naturally occurring resource that embodies the rich and dynamic nature of language, a data repository of unparalleled size and diversity. However, current Web search methods are oriented more toward shallow information retrieval techniques than toward the more sophisticated needs of linguists. Using the Web in linguistic research is not easy. It will, however, be getting easier. This report introduces the Linguist's Search Engine, a new linguist-friendly tool that makes it possible to retrieve naturally occurring sentences from the World Wide Web on the basis of lexical content and syntactic structure. Its aim is to help linguists of all stripes in conducting more thoroughly empirical exploration of evidence, with particular attention to variability and the role of context. LAMP-TR-108 UMIACS-TR-2003-109Item Mapping Lexical Entries in a Verbs Database to WordNet Senses(2001-05-10) Green, Rebecca; Pearl, Lisa; Dorr, Bonnie J.; Resnik, PhilipThis paper describes automatic techniques for mapping 9611 entries in a database of English verbs to WordNet senses. The verbs were initially grouped into 491 classes based on syntactic categories. Mapping these classified verbs into WordNet senses provides a resource that may be used for disambiguation in multilingual applications such as machine translation and cross-language information retrieval. Our techniques make use of (1) a training set of 1791 disambiguated entries, representing 1442 verb entries from 167 of the categories; (2) word sense probabilities based on frequency counts in a previously tagged corpus; (3) semantic similarity of WordNet senses for verbs within the same class; (4) probabilistic correlations between WordNet data and attributes of the verb classes. The best results achieved 72% precision and 58% recall, versus a lower bound of 62% precision and 38% recall for assigning the most frequently occurring WordNet sense, and an upper bound of 87% precision and 75% recall for human judgment. (Cross-referenced as UMIACS-TR-2001-18) (Cross-referenced as LAMP-TR-068)Item Measuring Verb Similarity(2000-06-21) Resnik, Philip; Diab, MonaThe way we model semantic similarity is closely tied to our understanding of linguistic representations. We present several models of semantic similarity, based on differing representational assumptions, and investigate their properties via comparison with human ratings of verb similarity. The results offer insight into the bases for human similarity judgments and provide a testbed for further investigation of the interactions among syn tactic properties, semantic structure, and semantic con tent. (Also cross-referenced as UMIACS-TR-2000-40, LAMP-TR-047)Item Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text(1998-10-15) Resnik, PhilipParallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention. (Also cross-referenced as UMIACS-TR-98-41)Item The Web as a Parallel Corpus(2002-07-09) Resnik, Philip; Smith, Noah A.Also UMIACS-TR-2002-61. Also LAMP-TR-089