A Modality Lexicon and its use in Automatic Tagging

Baker, Kathryn; Bloodgood, Michael; Dorr, Bonnie; Filardo, Nathaniel; Levin, Lori; Piatko, Christine

A Modality Lexicon and its use in Automatic Tagging

Files

modalityTaggingLREC2010.pdf (352.23 KB)

No. of downloads: 373

Date

2010-05

Authors

Citation

Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, and Christine Piatko. 2010. A modality lexicon and its use in automatic tagging. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), pages 1402-1407, Valletta, Malta, May. European Language Resources Association.

Abstract

This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger—a structure-based tagger—results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.

URI (handle)

http://hdl.handle.net/1903/15554

Collections

Center for Advanced Study of Language Research Works

Full item page