Handwriting identification, matching, and indexing in noisy document images
Doermann, David S
MetadataShow full item record
Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and background patterns. The mixture of handwriting with other components presents a great challenge for making an original document electronically accessible. Many handwritten documents come together with a special background pattern, rule lines, which are printed on the paper to guide writing. After digitization, rule lines will touch text and cause problems for further document image analysis if they are not detected and removed. In this dissertation, we present a rule line detection algorithm based on hidden Markov model (HMM) decoding, achieving both high detection accuracy and a low false alarm rate. After detection, line removal is performed by line width thresholding. Handwriting often mixes with printed text, such as signatures and annotations on a business letter. Handwriting in a printed document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content. The data set we are processing is noisy, which makes the problem more challenging. In this dissertation, we first segment the document at a suitable level, and then classify each segmented block as machine printed text, handwriting, or noise. Markov random field (MRF) based post-processing is exploited to refine the classification results. The identified handwriting may be further analyzed. In this dissertation, we propose a novel point-pattern based handwriting matching technique and apply it for handwriting synthesis and retrieval. We formulate point matching as an optimization problem trying to preserve the local neighborhood structures. After establishing the correspondence between two handwriting samples, we warp one sample toward the other using the thin plate spline (TPS) deformation model to synthesize new handwriting samples. We also apply our matching algorithm for handwriting retrieval since it is much easier to define robust features based on the matching results.