Show simple item record

Adaptive Analysis and Processing of Structured Multilingual Documents

dc.contributor.advisorChellappa, Ramaen_US
dc.contributor.advisorDoermann, David Sen_US
dc.contributor.authorMa, Huanfengen_US
dc.description.abstractDigital document processing is becoming popular for application to office and library automation, bank and postal services, publishing houses and communication management. In recent years, the demand for tools capable of searching written and spoken sources of multilingual information has increased tremendously, where the bilingual dictionary is one of the important resource to provide the required information. Processing and analysis of bilingual dictionaries brought up the challenges of dealing with many different scripts, some of which are unknown to the designer. A framework is presented to adaptively analyze and process structured multilingual documents, where adaptability is applied to every step. The proposed framework involves: (1) General word-level script identification using Gabor filter. (2) Font classification using the grating cell operator. (3) General word-level style identification using Gaussian mixture model. (4) An adaptable Hindi OCR based on generalized Hausdorff image comparison. (5) Retargetable OCR with automatic training sample creation and its applications to different scripts. (6) Bootstrapping entry segmentation, which segments each page into functional entries for parsing. Experimental results working on different scripts, such as Chinese, Korean, Arabic, Devanagari, and Khmer, demonstrate that the proposed framework can save human efforts significantly by making each phase adaptive.en_US
dc.format.extent7876031 bytes
dc.titleAdaptive Analysis and Processing of Structured Multilingual Documentsen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentElectrical Engineeringen_US
dc.subject.pqcontrolledEngineering, Electronics and Electricalen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pqcontrolledInformation Scienceen_US
dc.subject.pquncontrolledDocument Analysisen_US
dc.subject.pquncontrolledPattern Recognitionen_US
dc.subject.pquncontrolledComputer Visionen_US
dc.subject.pquncontrolledMultilingual Documentsen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record