COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS

Du, Xianzhi

COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS

Files

Du_umd_0117E_18703.pdf (21.67 MB)

No. of downloads: 246

Date

2017

Authors

Du, Xianzhi

Advisor

Davis, Larry
Doermann, David

DRUM DOI

https://doi.org/10.13016/M2VT1GS9H

Abstract

There are three work on signature matching for document analysis. In the first work,

we propose a large-scale signature matching method based on locality sensitive hashing

(LSH). Shape Context features are used to describe the structure of signatures. Two stages

of hashing are performed to find the nearest neighbors for query signatures. We show

that our algorithm can achieve a high accuracy even when few signatures are collected

from one same person and perform fast matching when dealing with a large dataset. In

the second work, we present a novel signature matching method based on supervised

topic models. Shape Context features are extracted from signature shape contours which

capture the local variations in signature properties. We then use the concept of topic

models to learn the shape context features which correspond to individual authors. We

demonstrate considerable improvement over state of the art methods. In the third work,

we present a partial signature matching method using graphical models. In additional

to the second work, modified shape context features are extracted from the contour of

signatures to describe both full and partial signatures. Hierarchical Dirichlet processes

are implemented to infer the number of salient regions needed. The results show the

effectiveness of the approach for both the partial and full signature matching.

There are three work on deep learning for object detection and segmentation. In

the first work, we propose a deep neural network fusion architecture for fast and robust

pedestrian detection. The proposed network fusion architecture allows for parallel processing

of multiple networks for speed. A single shot deep convolutional network is

trained as an object detector to generate all possible pedestrian candidates of different

sizes and occlusions. Next, multiple deep neural networks are used in parallel for further

refinement of these pedestrian candidates. We introduce a soft-rejection based network

fusion method to fuse the soft metrics from all networks together to generate the final

confidence scores. Our method performs better than existing state-of-the-arts, especially

when detecting small-size and occluded pedestrians. Furthermore, we propose a method

for integrating pixel-wise semantic segmentation network into the network fusion architecture

as a reinforcement to the pedestrian detector. In the second work, in addition to

the first work, a fusion network is trained to fuse the multiple classification networks.

Furthermore, a novel soft-label method is devised to assign floating point labels to the

pedestrian candidates. This metric for each candidate detection is derived from the percentage

of overlap of its bounding box with those of other ground truth classes. In the

third work, we propose a boundary-sensitive deep neural network architecture for portrait

segmentation. A residual network and atrous convolution based framework is trained as

the base portrait segmentation network. To better solve boundary segmentation, three

techniques are introduced. First, an individual boundary-sensitive kernel is introduced by

labeling the boundary pixels as a separate class and using the soft-label strategy to assign

floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple

classes when updating loss based on its relative position to the contour. Second, a

global boundary-sensitive kernel is used when updating loss function to assign different

weights to pixel locations on one image to constrain the global shape of the resulted segmentation

map. Third, we add multiple binary classifiers to classify boundary-sensitive

portrait attributes, so as to refine the learning process of our model.

URI (handle)

http://hdl.handle.net/1903/20758

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations

Full item page