DOCUMENT AND NATURAL IMAGE APPLICATIONS OF DEEP LEARNING

Kang, Le

DOCUMENT AND NATURAL IMAGE APPLICATIONS OF DEEP LEARNING

Files

Kang_umd_0117E_16490.pdf (26.03 MB)

No. of downloads: 724

Date

2015

Authors

Kang, Le

Advisor

Chellappa, Rama
Doermann, David

DRUM DOI

https://doi.org/10.13016/M2RW73

Abstract

A tremendous amount of digital visual data is being collected every day, and we need efficient and effective algorithms to extract useful information from that data. Considering the complexity of visual data and the expense of human labor, we expect algorithms to have enhanced generalization capability and depend less on domain knowledge. While many topics in computer vision have benefited from machine learning, some document analysis and image quality assessment problems still have not found the best way to utilize it. In the context of document images, a compelling need exists for reliable methods to categorize and extract key information from captured images. In natural image content analysis, accurate quality assessment has become a critical component for many applications. Most current approaches, however, rely on the heuristics designed by human observations on severely limited data. These approaches typically work only on specific types of images and are hard to generalize on complex data from real applications.

This dissertation looks to address the challenges of processing heterogeneous visual data by applying effective learning methods that directly model the data with minimal preprocessing and feature engineering. We focus on three important problems - text line detection, document image categorization, and image quality assessment. The data we work on typically contains unconstrained layouts, styles, or noise, which resemble the real data from applications. First, we present a graph-based method, learning the line structure from training data for text line segmentation in handwritten document images, and a general framework to detect multi-oriented scene text lines using Higher-Order Correlation Clustering. Our method depends less on domain knowledge and is robust to variations in fonts or languages. Second, we introduce a general approach for document image genre classification using Convolutional Neural Networks (CNN). The introduction of CNNs for document image genre classification largely reduces the needs of hand-crafted features or domain knowledge. Third, we present our CNN based methods to general-purpose No-Reference Image Quality Assessment (NR-IQA). Our methods bridge the gap between NR-IQA and CNN and opens the door to a broad range of deep learning methods. With excellent local quality estimation ability, our methods demonstrate the state of art performance on both distortion identification and quality estimation.

URI (handle)

http://hdl.handle.net/1903/17042

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations

Full item page