Handwriting identification, matching, and indexing in noisy document images

dc.contributor.advisorChellappa, Ramaen_US
dc.contributor.advisorDoermann, David Sen_US
dc.contributor.authorZheng, Yefengen_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2006-02-04T08:11:07Z
dc.date.available2006-02-04T08:11:07Z
dc.date.issued2005-12-19en_US
dc.description.abstractThroughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and background patterns. The mixture of handwriting with other components presents a great challenge for making an original document electronically accessible. Many handwritten documents come together with a special background pattern, rule lines, which are printed on the paper to guide writing. After digitization, rule lines will touch text and cause problems for further document image analysis if they are not detected and removed. In this dissertation, we present a rule line detection algorithm based on hidden Markov model (HMM) decoding, achieving both high detection accuracy and a low false alarm rate. After detection, line removal is performed by line width thresholding. Handwriting often mixes with printed text, such as signatures and annotations on a business letter. Handwriting in a printed document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content. The data set we are processing is noisy, which makes the problem more challenging. In this dissertation, we first segment the document at a suitable level, and then classify each segmented block as machine printed text, handwriting, or noise. Markov random field (MRF) based post-processing is exploited to refine the classification results. The identified handwriting may be further analyzed. In this dissertation, we propose a novel point-pattern based handwriting matching technique and apply it for handwriting synthesis and retrieval. We formulate point matching as an optimization problem trying to preserve the local neighborhood structures. After establishing the correspondence between two handwriting samples, we warp one sample toward the other using the thin plate spline (TPS) deformation model to synthesize new handwriting samples. We also apply our matching algorithm for handwriting retrieval since it is much easier to define robust features based on the matching results.en_US
dc.format.extent2248909 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/3289
dc.language.isoen_US
dc.subject.pqcontrolledEngineering, Electronics and Electricalen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pquncontrolledDocument Image Analysisen_US
dc.subject.pquncontrolledLine Detectionen_US
dc.subject.pquncontrolledForm Processingen_US
dc.subject.pquncontrolledNonrigid Shape Matchingen_US
dc.titleHandwriting identification, matching, and indexing in noisy document imagesen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-3118.pdf
Size:
2.14 MB
Format:
Adobe Portable Document Format