Machine Printed Text and Handwriting Identification in Noisy Document Images

dc.contributor.authorZheng, Yefengen_US
dc.contributor.authorLi, Huipingen_US
dc.contributor.authorDoermann, Daviden_US
dc.date.accessioned2004-05-31T23:32:51Z
dc.date.available2004-05-31T23:32:51Z
dc.date.created2003-09en_US
dc.date.issued2003-09-25en_US
dc.description.abstractIn this paper we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content, and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise, and we further exploit context to refine the classification. A Markov Random Field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections. (LAMP-TR-107) (CAR-TR-992) (UMIACS-TR-2003-99)en_US
dc.format.extent873710 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/1316
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4531en_US
dc.relation.ispartofseriesLAMP-TR-107en_US
dc.relation.ispartofseriesCAR-TR-992en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2003-99en_US
dc.titleMachine Printed Text and Handwriting Identification in Noisy Document Imagesen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CS-TR-4531.pdf
Size:
853.23 KB
Format:
Adobe Portable Document Format