Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

Rodrigues, Paul; Zajic, David; Doermann, David; Bloodgood, Michael; Ye, Peng

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

dc.contributor.author	Rodrigues, Paul
dc.contributor.author	Zajic, David
dc.contributor.author	Doermann, David
dc.contributor.author	Bloodgood, Michael
dc.contributor.author	Ye, Peng
dc.date.accessioned	2014-08-20T21:25:25Z
dc.date.available	2014-08-20T21:25:25Z
dc.date.issued	2011-11
dc.description.abstract	Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating elements to represent the structure within each lexical entry, in the form of an XML tree. In many cases, dictionaries are published that have errors and inconsistencies that are expensive to find manually. This paper discusses a method for dictionary writers to quickly audit structural regularity across entries in a dictionary by using statistical language modeling. The approach learns the patterns of XML nodes that could occur within an XML tree, and then calculates the probability of each XML tree in the dictionary against these patterns to look for entries that diverge from the norm.	en_US
dc.description.sponsorship	This material is based upon work supported, in whole or in part, with funding from the United States Government. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the University of Maryland, College Park and/or any agency or entity of the United States Government. Nothing in this report is intended to be and shall not be treated or construed as an endorsement or recommendation by the University of Maryland, United States Government, or the authors of the product, process, or service that is the subject of this report. No one may use any information contained or based on this report in advertisements or promotional materials related to any company product, process, or service or in support of other commercial purposes.	en_US
dc.identifier	https://doi.org/10.13016/M2WC75
dc.identifier.citation	Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, and Peng Ye. 2011. Detecting structural irregularity in electronic dictionaries using language modeling. In Proceedings of Electronic Lexicography in the 21st Century (eLex), pages 227-232, Bled, Slovenia, November. Trojina Institute for Applied Slovene Studies.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/15576
dc.language.iso	en_US	en_US
dc.publisher	Trojina Institute for Applied Slovene Studies	en_US
dc.relation.isAvailableAt	Center for Advanced Study of Language
dc.relation.isAvailableAt	Digitial Repository at the University of Maryland
dc.relation.isAvailableAt	University of Maryland (College Park, Md)
dc.subject	computer science	en_US
dc.subject	statistical methods	en_US
dc.subject	computational linguistics	en_US
dc.subject	natural language processing	en_US
dc.subject	human language technology	en_US
dc.subject	electronic lexicography	en_US
dc.subject	XML	en_US
dc.subject	language modeling	en_US
dc.subject	anomaly detection	en_US
dc.subject	error correction	en_US
dc.subject	electronic dictionaries	en_US
dc.title	Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: detectingStructuralIrregularity_eLex2011.pdf
Size:: 1.14 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.57 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Center for Advanced Study of Language Research Works