Browsing by Author "Dorr, Bonnie J."
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Aspectual Modifications to a LCS Database for NLP Applications(1998-10-15) Dorr, Bonnie J.; Olsen, Mari BromanVerbal and compositional lexical aspect provide the underlying temporal structure of events. Knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (Dowty, 1986: Moens and Steedman, 1988; Passoneau, 1988), interfacing to temporal databases (Androutsopoulos, 1996), processing temporal modifiers (Antonisse, 1994), describing allowable alternations and their semantic effects (Resnik, 1996; Tenny, 1994), and selecting tense and lexical items for natural language generation ((Dorr and Olsen, 1996; Klavans and Chodorow, 1992), cf. (Slobin and Bocaz, 1988)). We show that it is possible to represent lexical aspect---both verbal and compositional---on a large scale, using Lexical Conceptual Structure (LCS) representations of verbs in the classes cataloged by Levin (1993). We show how proper consideration of these universal pieces of verb meaning may be used to refine lexical representations and derive a range of meanings from combinations of LCS representations. A single algorithm may therefore be used to determine lexical aspect classes and features at both verbal and sentence levels. Finally, we illustrate how knowledge of lexical aspect facilitates the interpretation of events in NLP applications. (Also cross-referenced as UMIACS-TR-97-21) (Also cross-referenced as LAMP-TR-007)Item Automatic Extraction of Semantic Classes from Syntactic Information in Online Resources(1998-10-15) Dorr, Bonnie J.; Jones, DougThis paper addresses the issue of word-sense ambiguity in extraction from machine-readable resources for the construction of large-scale knowledge sources. We describe two experiments: one which took word-sense distinctions into account, resulting in 97.9% accuracy for semantic classification of verbs based on (Levin, 1993); and one which ignored word-sense distinctions, resulting in 6.3% accuracy. These experiments were dual purpose: (1) to validate the central thesis of the work of (Levin, 1993), i.e., that verb semantics and syntactic behavior are predictably related; (2) to demonstrate that a 20-fold improvement can be achieved in deriving semantic information from syntactic cues if we first divide the syntactic cues into distinct groupings that correlate with different word senses. Finally, we show that we can provide effective acquisition techniques for novel word senses using a combination of online sources. (Also cross-referenced as UMIACS-TR-95-65)Item Bilingual Lexicon Construction Using Large Corpora(1998-10-15) Shen, Wade; Dorr, Bonnie J.This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of filtered linguistic feedback between term and sentence alignment processes. An implementation of this algorithm, TAG-ALIGN, is evaluated against approaches similar to [Brown et al. 1993] that apply Bayesian techniques for term alignment, and [Gale and Church 1991] a dynamic programming method for aligning sentences. The ultimate goal is to produce large bilingual lexicons with a high degree of accuracy from potentially noisy corpora. (Also cross-referenced as UMIACS-TR-97-50)Item Chinese-English Semantic Resource Construction(2000-06-13) Dorr, Bonnie J.; Levow, Gina-Anne; Lin, Dekang; Thomas, ScottWe describe an approach to large-scale construction of a semantic lexicon for Chinese verbs. We leverage off of three existing resources--a classification of English verbs called EVCA (English Verbs Classes and Alterations) [Levin, 1993], a Chinese conceptual database called HowNet [Zhendong, 1988c, Zhendong, 1988b] (http://www.how-net.com), and a large machine-readable dictionary called Optilex. The resulting lexicon is used for determining appropriate word senses in applications such as machine translation and cross-language information retrieval. (Also cross-referenced as UMIACS-TR-2000-27) (Also cross-referenced as LAMP-TR-044)Item A Comparative Study of Knowledge-Based Approaches for Cross-Language Information Retrieval(1998-10-15) Oard, Douglas W.; Dorr, Bonnie J.; Hackett, Paul G.; Katsova, MariaCross-language retrieval systems seek to use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, bilingual dictionaries and translation lexicons, for cross-language retrieval. We have implemented six query translation techniques that use bilingual dictionaries, one based on lexical-semantic analysis, and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better effectiveness than simple dictionary-based techniques, and that performing document translation rather than query translation may result in further improvements in retrieval effectiveness under some conditions. (Also cross-referenced as UMIACS-TR-98-27)Item Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions(2002-08-30) Dorr, Bonnie J.; Gaasterland, TerryGenerating language that reflects the temporal organization of represented knowledge requires a language generation model that integrates contemporary theories of tense and aspect, temporal representations, and methods to plan text. This paper presents a model that produces event combinations and appropriate connecting words to relate them. We distinguish between inherent and non-inherent aspectual features of verbs and describe an algorithm that uses these features to select tense, aspect, and temporal connecting words for generating text based on time-stamped information. The main result of this work is the successful incorporation of constrained linguistic theories of tense and aspect in a self-contained module called CONGEN that produces a ranked list of temporal connectives and tense/aspect possibilities from pairs of time-stamped literals. We show that the theoretical results described herein have been verified in a large-scale corpus analysis. The framework serves as the basis of a component designed to enhance the English output of a constrained generation system. (LAMP-TR-091) (UMIACS-TR-2002-71)Item Development of Cross-Linguistic Syntactic and Semantic Parameters for Parsing and Generation(1998-10-15) Dorr, Bonnie J.This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. The translation approach adopted here is interlingual i.e., a single underlying representation called Lexical Conceptual Structure (LCS) is used for both Korean and English. The primary focus of this investigation concerns the notion of `parameterization' i.e., a mechanism that accounts for both syntactic and lexical-semantic distinctions between Korean and English. We present our assumptions about the syntactic structure of Korean-type languages vs. English-type languages and describe our investigation of syntactic parameterization for distinguishing between these two types of languages. We also present the details of the LCS structure and describe how this representation is parameterized so that it accommodates both languages. We address critical issues concerning interlingual machine translation such as locative postpositions and the dividing line between the interlingua and the knowledge representation. Difficulties in translation and transliteration of Korean are discussed and complex morphological properties of Korean are presented. Finally, we describe recent work on lexical acquisition and conclude with a discussion about two hypotheses concerning semantic classification that are currently being tested. (Also cross-referenced as UMIACS-TR-94-26)Item Development of Interlingual Lexical Conceptual Structures with Syntactic Markers for Machine Translation(1998-10-15) Dorr, Bonnie J.; Lee, Jye-hoon; Voss, Clare; Suh, SungkiThis document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. Our primary objective was to develop an interlingual representation based on lexical conceptual structure (LCS) and to examine the relation between this representation and a set of linguistically motivated semantic classes. We view the work of the past year as a critical step toward achieving our goal of building a generator: the classification of LCS's into a semantic hierarchy provides a systematic mapping between semantic knowledge about verbs and their surface syntactic structures. We have focused on several areas in support of our objectives: (1) investigation of morphological structure including distinctions between Korean and English; (2) porting a fast, message-passing parser to Korean (and to the IBM PC); (3) study of free word order and development of the associated processing algorithm; (4) investigation of the aspectual dimension as it impacts morphology, syntax, and lexical semantics; (5) investigation of the relation between semantic classes and syntactic structure; (6) development of theta-role and lexical-semantic templates through lexical acquisition techniques; (7) definition a mapping between KR concepts and interlingual representations; (8) formalization of the lexical conceptual structure (Also cross-referenced as UMIACS-TR-95-16)Item Large-Scale Construction of a Chinese-English Semantic Hierarchy(2000-06-13) Dorr, Bonnie J.; Levow, Gina-Anne; Lin, DekangThis paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a semantic hierarchy for Chinese verbs, using thematic-role information to create links between Chinese concepts and English classes. We then present an approach to compensating for gaps in the existing resources. The resulting hierarchy is used for a multilingual lexicon for Chinese-English machine translation and cross-language information retrieval applications. (Also cross-referenced as UMIACS-TR-2000-17) (Also cross-referemced as LAMP-TR-040)Item Lexical Resource Integration across the Syntax-Semantics Interface(2001-05-10) Green, Rebecca; Pearl, Lisa; Dorr, Bonnie J.; Resnik, PhilipThis paper examines extending a database of English verbs, grouped into syntactico-semantic classes, with WordNet senses. Probabilistic associations between theta-grids and WordNet verb frames, SEMCOR frequency data, and disambiguation based on an information-theoretic notion of semantic similarity are used. Mapping successes and failures are illustrated with 'drop'. (Cross-referenced as UMIACS-TR-2001-18) (Cross-refereneced as LAMP-TR-068)Item Lexical Selection for Cross-Language Applications: Combining LCS with WordNet(1998-10-15) Dorr, Bonnie J.; Katsova, MariaThis paper describes experiments for testing the power of large-scale resources for lexical selection in machine translation (MT) and cross-language information retrieval (CLIR). We adopt the view that verbs with similar argument structure share certain meaning components, but that those meaning components are more relevant to argument realization than to idiosyncratic verb meaning. We verify this by demonstrating that verbs with similar argument structure as encoded in Lexical Conceptual Structure (LCS) are rarely synonymous in WordNet. We then use the results of this work to guide our implementation of an algorithm for cross-language selection of lexical items, exploiting the strengths of each resource: LCS for semantic structure and WordNet for semantic content. We use the Parka Knowledge-Based System to encode LCS representations and WordNet synonym sets and we implement our lexical-selection algorithm as Parka-based queries into a knowledge base containing both information types. (Also cross-referenced as UMIACS-TR-98-49) (Also cross-referenced as LAMP-TR-021)Item LEXICALL: Lexicon Construction for Foreign Language Tutoring(1998-10-15) Dorr, Bonnie J.We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the definition of a relation between broad semantic classes and LCS meaning components. Our acquisition program---LEXICALL---takes, as input, the result of previous work on verb classification and thematic grid tagging, and outputs LCS representations for different languages. These representations have been ported into English, Arabic and Spanish lexicons, each containing approximately 9000 verbs. We are currently using these lexicons in an operational foreign language tutoring and machine translation. (Also cross-referenced as UMIACS-TR-97-09)Item Mapping Lexical Entries in a Verbs Database to WordNet Senses(2001-05-10) Green, Rebecca; Pearl, Lisa; Dorr, Bonnie J.; Resnik, PhilipThis paper describes automatic techniques for mapping 9611 entries in a database of English verbs to WordNet senses. The verbs were initially grouped into 491 classes based on syntactic categories. Mapping these classified verbs into WordNet senses provides a resource that may be used for disambiguation in multilingual applications such as machine translation and cross-language information retrieval. Our techniques make use of (1) a training set of 1791 disambiguated entries, representing 1442 verb entries from 167 of the categories; (2) word sense probabilities based on frequency counts in a previously tagged corpus; (3) semantic similarity of WordNet senses for verbs within the same class; (4) probabilistic correlations between WordNet data and attributes of the verb classes. The best results achieved 72% precision and 58% recall, versus a lower bound of 62% precision and 38% recall for assigning the most frequently occurring WordNet sense, and an upper bound of 87% precision and 75% recall for human judgment. (Cross-referenced as UMIACS-TR-2001-18) (Cross-referenced as LAMP-TR-068)Item A Survey of Current Paradigms in Machine Translation(1998-12-02) Dorr, Bonnie J.; Jordan, Pamela W.; Benoit, John W.This is paper is a survey of the current machine translation research in the US, Europe, and Japan. A short history of machine translation is presented first, followed by an overview of the current research work. Representative examples of a wide range of different approaches adopted by machine translation researchers are presented. These are described in detail along with a discussion of the practicalities of scaling up these approaches for operational environments. In support of this discussion, issues in, and techniques for, evaluating machine translation systems are discussed. Also cross-referenced as UMIACS-TR-98-72)Item A Survey of Multilingual Text Retrieval(1998-10-15) Oard, Douglas W.; Dorr, Bonnie J.This report reviews the present state of the art in selection of texts in one language based on queries in another, a problem we refer to as ``multilingual'' text retrieval. Present applications of multilingual text retrieval systems are limited by the cost and complexity of developing and using the multilingual thesauri on which they are based and by the level of user training that is required to achieve satisfactory search effectiveness. A general model for multilingual text retrieval is used to review the development of the field and to describe modern production and experimental systems. The report concludes with some observations on the present state of the art and an extensive bibliography of the technical literature on multilingual text retrieval. The research reported herein was supported, in part, by Army Research Office contract DAAL03-91-C-0034 through Battelle Corporation, NSF NYI IRI-9357731, Alfred P. Sloan Research Fellow Award BR3336, and a General Research Board Semester Award. (Also cross-referenced as UMIACS-TR-96-19)Item A Thematic Hierarchy for Efficient Generation from Lexical-Conceptual Structure(1998-10-15) Dorr, Bonnie J.; Habash, Nizar; Traum, DavidThis paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an off-the-shelf generator (Nitrogen). There are two contributions of this work: (1) the development of a thematic hierarchy that provides ordering information for realization of arguments in their surface positions; (2) the provision of a diagnostic tool for detecting inconsistencies in an existing online LCS-based lexicon that allows us to enhance principles for thematic-role assignment. (Also cross-referenced as UMIACS-TR-98-50) (Also cross-refernced as LAMP-TR-022)Item Toward Compact Monotonically Compositional Interlingua Using Lexical Aspect(1998-10-15) Dorr, Bonnie J.; Olsen, Mari Broman; Thomas, Scott C.We describe a theoretical investigation into the semantic space described by our interlingua (IL), which currently has 191 main verb classes divided into 434 subclasses, represented by 237 distinct Lexical Conceptual Structures (LCSs). Using the model of aspect in Olsen (1994b, 1997a)---monotonic aspectual composition---we have identified 71 aspectually basic subclasses that are associated with one or more of 68 aspectually non-basic classes via some lexical (``type-shifting'') rule (Bresnan 1982, Pinker 1984, Levin and Rappaport Hovav 1995). This allows us to refine the IL and address certain computational and theoretical issues at the same time. (1) >From a linguistic viewpoint, the expected benefits include a refinement of the aspectual model in (Olsen:1994b, 1997a) (which provides necessary but not sufficient conditions for aspectual composition), and a refinement of the verb classifications in (Levin 1993); we also expect our approach to eventually produce a systematic definition (in terms of LCSs and compositional operations) of the precise meaning components responsible for Levin's classification. (2) Computationally, the lexicon is made more compact. Also cross-referenced as UMIACS-TR-97-86 Also cross-referenced as LAMP-TR-012Item Using WordNet to Posit Hierarchical Structure in Levin's Verb Classes(1998-10-15) Olsen, Mari Broman; Dorr, Bonnie J.; Clark, David J.In this paper we report on experiments using WordNet synset tags to evaluate the semantic properties of the verb classes cataloged by Levin 1993. This paper represents ongoing research begun at the University of Pennsylvania (Rosenzweig et al. 1997, Palmer et al. 1997) and the University of Maryland (Dorr and Jones 1996b, 1996d, 1996e). Using WordNet sense tags to constrain the intersection of Levin classes, we avoid spurious class intersections introduced by homonymy and polysemy (_run a bath, run a mile_). By adding class intersections based on a single shared sense-tagged word, we minimize the impact of the non-exhaustiveness of Levin's database (Dorr and Olsen 1996, Dorr to appear). By examining the syntactic properties of the intersective classes, we provide a clearer picture of the relationship between WordNet/EuroWordNet and the LCS interlingua for machine translation and other NLP applications. Also cross-referenced as UMIACS-TR-97-85 Also cross-referenced as LAMP-TR-011