SEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONS

Jain, Rajiv

SEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONS

dc.contributor.advisor	Doermann, David	en_US
dc.contributor.advisor	Jacobs, David	en_US
dc.contributor.author	Jain, Rajiv	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2015-06-26T05:45:31Z
dc.date.available	2015-06-26T05:45:31Z
dc.date.issued	2015	en_US
dc.description.abstract	A decrease in data storage costs and widespread use of scanning devices has led to massive quantities of scanned digital documents in corporations, organizations, and governments around the world. Automatically processing these large heterogeneous collections can be difficult due to considerable variation in resolution, quality, font, layout, noise, and content. In order to make this data available to a wide audience, methods for efficient retrieval and analysis from large collections of document images remain an open and important area of research. In this proposal, we present research in three areas that augment the current state of the art in the retrieval and analysis of large heterogeneous document image collections. First, we explore an efficient approach to document image retrieval, which allows users to perform retrieval against large image collections in a query-by-example manner. Our approach is compared to text retrieval of OCR on a collection of 7 million document images collected from lawsuits against tobacco companies. Next, we present research in document verification and change detection, where one may want to quickly determine if two document images contain any differences (document verification) and if so, to determine precisely what and where changes have occurred (change detection). A motivating example is legal contracts, where scanned images are often e-mailed back and forth and small changes can have severe ramifications. Finally, approaches useful for exploiting the biometric properties of handwriting in order to perform writer identification and retrieval in document images are examined.	en_US
dc.identifier	https://doi.org/10.13016/M2XP66
dc.identifier.uri	http://hdl.handle.net/1903/16674
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Information technology	en_US
dc.subject.pquncontrolled	Change Detection	en_US
dc.subject.pquncontrolled	Document Image	en_US
dc.subject.pquncontrolled	Information Retreival	en_US
dc.subject.pquncontrolled	Writer Identification	en_US
dc.title	SEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONS	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Jain_umd_0117E_16169.pdf
Size:: 5.16 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations