COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
There are three work on signature matching for document analysis. In the first work,
we propose a large-scale signature matching method based on locality sensitive hashing
(LSH). Shape Context features are used to describe the structure of signatures. Two stages
of hashing are performed to find the nearest neighbors for query signatures. We show
that our algorithm can achieve a high accuracy even when few signatures are collected
from one same person and perform fast matching when dealing with a large dataset. In
the second work, we present a novel signature matching method based on supervised
topic models. Shape Context features are extracted from signature shape contours which
capture the local variations in signature properties. We then use the concept of topic
models to learn the shape context features which correspond to individual authors. We
demonstrate considerable improvement over state of the art methods. In the third work,
we present a partial signature matching method using graphical models. In additional
to the second work, modified shape context features are extracted from the contour of
signatures to describe both full and partial signatures. Hierarchical Dirichlet processes
are implemented to infer the number of salient regions needed. The results show the
effectiveness of the approach for both the partial and full signature matching.
There are three work on deep learning for object detection and segmentation. In
the first work, we propose a deep neural network fusion architecture for fast and robust
pedestrian detection. The proposed network fusion architecture allows for parallel processing
of multiple networks for speed. A single shot deep convolutional network is
trained as an object detector to generate all possible pedestrian candidates of different
sizes and occlusions. Next, multiple deep neural networks are used in parallel for further
refinement of these pedestrian candidates. We introduce a soft-rejection based network
fusion method to fuse the soft metrics from all networks together to generate the final
confidence scores. Our method performs better than existing state-of-the-arts, especially
when detecting small-size and occluded pedestrians. Furthermore, we propose a method
for integrating pixel-wise semantic segmentation network into the network fusion architecture
as a reinforcement to the pedestrian detector. In the second work, in addition to
the first work, a fusion network is trained to fuse the multiple classification networks.
Furthermore, a novel soft-label method is devised to assign floating point labels to the
pedestrian candidates. This metric for each candidate detection is derived from the percentage
of overlap of its bounding box with those of other ground truth classes. In the
third work, we propose a boundary-sensitive deep neural network architecture for portrait
segmentation. A residual network and atrous convolution based framework is trained as
the base portrait segmentation network. To better solve boundary segmentation, three
techniques are introduced. First, an individual boundary-sensitive kernel is introduced by
labeling the boundary pixels as a separate class and using the soft-label strategy to assign
floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple
classes when updating loss based on its relative position to the contour. Second, a
global boundary-sensitive kernel is used when updating loss function to assign different
weights to pixel locations on one image to constrain the global shape of the resulted segmentation
map. Third, we add multiple binary classifiers to classify boundary-sensitive
portrait attributes, so as to refine the learning process of our model.