SEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONS

dc.contributor.advisorDoermann, Daviden_US
dc.contributor.advisorJacobs, Daviden_US
dc.contributor.authorJain, Rajiven_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2015-06-26T05:45:31Z
dc.date.available2015-06-26T05:45:31Z
dc.date.issued2015en_US
dc.description.abstractA decrease in data storage costs and widespread use of scanning devices has led to massive quantities of scanned digital documents in corporations, organizations, and governments around the world. Automatically processing these large heterogeneous collections can be difficult due to considerable variation in resolution, quality, font, layout, noise, and content. In order to make this data available to a wide audience, methods for efficient retrieval and analysis from large collections of document images remain an open and important area of research. In this proposal, we present research in three areas that augment the current state of the art in the retrieval and analysis of large heterogeneous document image collections. First, we explore an efficient approach to document image retrieval, which allows users to perform retrieval against large image collections in a query-by-example manner. Our approach is compared to text retrieval of OCR on a collection of 7 million document images collected from lawsuits against tobacco companies. Next, we present research in document verification and change detection, where one may want to quickly determine if two document images contain any differences (document verification) and if so, to determine precisely what and where changes have occurred (change detection). A motivating example is legal contracts, where scanned images are often e-mailed back and forth and small changes can have severe ramifications. Finally, approaches useful for exploiting the biometric properties of handwriting in order to perform writer identification and retrieval in document images are examined.en_US
dc.identifierhttps://doi.org/10.13016/M2XP66
dc.identifier.urihttp://hdl.handle.net/1903/16674
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledInformation technologyen_US
dc.subject.pquncontrolledChange Detectionen_US
dc.subject.pquncontrolledDocument Imageen_US
dc.subject.pquncontrolledInformation Retreivalen_US
dc.subject.pquncontrolledWriter Identificationen_US
dc.titleSEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONSen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jain_umd_0117E_16169.pdf
Size:
5.16 MB
Format:
Adobe Portable Document Format