COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS

dc.contributor.advisorDavis, Larryen_US
dc.contributor.advisorDoermann, Daviden_US
dc.contributor.authorDu, Xianzhien_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2018-07-17T05:34:30Z
dc.date.available2018-07-17T05:34:30Z
dc.date.issued2017en_US
dc.description.abstractThere are three work on signature matching for document analysis. In the first work, we propose a large-scale signature matching method based on locality sensitive hashing (LSH). Shape Context features are used to describe the structure of signatures. Two stages of hashing are performed to find the nearest neighbors for query signatures. We show that our algorithm can achieve a high accuracy even when few signatures are collected from one same person and perform fast matching when dealing with a large dataset. In the second work, we present a novel signature matching method based on supervised topic models. Shape Context features are extracted from signature shape contours which capture the local variations in signature properties. We then use the concept of topic models to learn the shape context features which correspond to individual authors. We demonstrate considerable improvement over state of the art methods. In the third work, we present a partial signature matching method using graphical models. In additional to the second work, modified shape context features are extracted from the contour of signatures to describe both full and partial signatures. Hierarchical Dirichlet processes are implemented to infer the number of salient regions needed. The results show the effectiveness of the approach for both the partial and full signature matching. There are three work on deep learning for object detection and segmentation. In the first work, we propose a deep neural network fusion architecture for fast and robust pedestrian detection. The proposed network fusion architecture allows for parallel processing of multiple networks for speed. A single shot deep convolutional network is trained as an object detector to generate all possible pedestrian candidates of different sizes and occlusions. Next, multiple deep neural networks are used in parallel for further refinement of these pedestrian candidates. We introduce a soft-rejection based network fusion method to fuse the soft metrics from all networks together to generate the final confidence scores. Our method performs better than existing state-of-the-arts, especially when detecting small-size and occluded pedestrians. Furthermore, we propose a method for integrating pixel-wise semantic segmentation network into the network fusion architecture as a reinforcement to the pedestrian detector. In the second work, in addition to the first work, a fusion network is trained to fuse the multiple classification networks. Furthermore, a novel soft-label method is devised to assign floating point labels to the pedestrian candidates. This metric for each candidate detection is derived from the percentage of overlap of its bounding box with those of other ground truth classes. In the third work, we propose a boundary-sensitive deep neural network architecture for portrait segmentation. A residual network and atrous convolution based framework is trained as the base portrait segmentation network. To better solve boundary segmentation, three techniques are introduced. First, an individual boundary-sensitive kernel is introduced by labeling the boundary pixels as a separate class and using the soft-label strategy to assign floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple classes when updating loss based on its relative position to the contour. Second, a global boundary-sensitive kernel is used when updating loss function to assign different weights to pixel locations on one image to constrain the global shape of the resulted segmentation map. Third, we add multiple binary classifiers to classify boundary-sensitive portrait attributes, so as to refine the learning process of our model.en_US
dc.identifierhttps://doi.org/10.13016/M2VT1GS9H
dc.identifier.urihttp://hdl.handle.net/1903/20758
dc.language.isoenen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledComputer visionen_US
dc.subject.pquncontrolledDeep learningen_US
dc.subject.pquncontrolledDocument analysisen_US
dc.subject.pquncontrolledObject detectionen_US
dc.subject.pquncontrolledSemantic segmentationen_US
dc.titleCOMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSISen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Du_umd_0117E_18703.pdf
Size:
21.67 MB
Format:
Adobe Portable Document Format