Center for Advanced Study of Language Research Works
Permanent URI for this collection
Browse
Browsing Center for Advanced Study of Language Research Works by Subject "automatic modality tagging"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item A Modality Lexicon and its use in Automatic Tagging(European Language Resources Association, 2010-05) Baker, Kathryn; Bloodgood, Michael; Dorr, Bonnie; Filardo, Nathaniel; Levin, Lori; Piatko, ChristineThis paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger—a structure-based tagger—results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.Item Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing(Association for Computational Linguistics, 2012-07-13) Prabhakaran, Vinodkumar; Bloodgood, Michael; Diab, Mona; Dorr, Bonnie; Levin, Lori; Piatko, Christine; Rambow, Owen; Van Durme, BenjaminWe explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatically training a modality tagger where we first gathered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical Turk annotators for further annotation. We used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance.