Machine Printed Text and Handwriting Identification in Noisy Document Images

View/ Open
Date
2003-09-25Author
Zheng, Yefeng
Li, Huiping
Doermann, David
Metadata
Show full item recordAbstract
In this paper we address the problem of the identification of text in noisy
document images. We are especially focused on segmenting and identifying
between handwriting and machine printed text because: 1) handwriting in a
document often indicates corrections, additions, or other supplemental
information that should be treated differently from the main content, and
2) the segmentation and recognition techniques requested for machine
printed and handwritten text are significantly different. A novel aspect
of our approach is that we treat noise as a separate class and model noise
based on selected features. Trained Fisher classifiers are used to
identify machine printed text and handwriting from noise, and we further
exploit context to refine the classification. A Markov Random Field (MRF)
based approach is used to model the geometrical structure of the printed
text, handwriting, and noise to rectify misclassifications. Experimental
results show that our approach is robust and can significantly improve
page segmentation in noisy document collections.
(LAMP-TR-107)
(CAR-TR-992)
(UMIACS-TR-2003-99)