DOCUMENT AND NATURAL IMAGE APPLICATIONS OF DEEP LEARNING

Kang, Le

DOCUMENT AND NATURAL IMAGE APPLICATIONS OF DEEP LEARNING

dc.contributor.advisor	Chellappa, Rama	en_US
dc.contributor.advisor	Doermann, David	en_US
dc.contributor.author	Kang, Le	en_US
dc.contributor.department	Electrical Engineering	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2015-09-18T05:52:47Z
dc.date.available	2015-09-18T05:52:47Z
dc.date.issued	2015	en_US
dc.description.abstract	A tremendous amount of digital visual data is being collected every day, and we need efficient and effective algorithms to extract useful information from that data. Considering the complexity of visual data and the expense of human labor, we expect algorithms to have enhanced generalization capability and depend less on domain knowledge. While many topics in computer vision have benefited from machine learning, some document analysis and image quality assessment problems still have not found the best way to utilize it. In the context of document images, a compelling need exists for reliable methods to categorize and extract key information from captured images. In natural image content analysis, accurate quality assessment has become a critical component for many applications. Most current approaches, however, rely on the heuristics designed by human observations on severely limited data. These approaches typically work only on specific types of images and are hard to generalize on complex data from real applications. This dissertation looks to address the challenges of processing heterogeneous visual data by applying effective learning methods that directly model the data with minimal preprocessing and feature engineering. We focus on three important problems - text line detection, document image categorization, and image quality assessment. The data we work on typically contains unconstrained layouts, styles, or noise, which resemble the real data from applications. First, we present a graph-based method, learning the line structure from training data for text line segmentation in handwritten document images, and a general framework to detect multi-oriented scene text lines using Higher-Order Correlation Clustering. Our method depends less on domain knowledge and is robust to variations in fonts or languages. Second, we introduce a general approach for document image genre classification using Convolutional Neural Networks (CNN). The introduction of CNNs for document image genre classification largely reduces the needs of hand-crafted features or domain knowledge. Third, we present our CNN based methods to general-purpose No-Reference Image Quality Assessment (NR-IQA). Our methods bridge the gap between NR-IQA and CNN and opens the door to a broad range of deep learning methods. With excellent local quality estimation ability, our methods demonstrate the state of art performance on both distortion identification and quality estimation.	en_US
dc.identifier	https://doi.org/10.13016/M2RW73
dc.identifier.uri	http://hdl.handle.net/1903/17042
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Electrical engineering	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.title	DOCUMENT AND NATURAL IMAGE APPLICATIONS OF DEEP LEARNING	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kang_umd_0117E_16490.pdf
Size:: 26.03 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations