Show simple item record

GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION

dc.contributor.advisorChellappa, Ramaen_US
dc.contributor.authorZi, Gangen_US
dc.date.accessioned2005-08-03T14:38:22Z
dc.date.available2005-08-03T14:38:22Z
dc.date.issued2005-05-02en_US
dc.identifier.urihttp://hdl.handle.net/1903/2524
dc.description.abstractThe problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness.en_US
dc.format.extent12371891 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleGROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATIONen_US
dc.typeThesisen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentElectrical Engineeringen_US
dc.subject.pqcontrolledEngineering, Electronics and Electricalen_US
dc.subject.pqcontrolledComputer Scienceen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record