Towards Robust, Interpretable and Scalable Visual Representations

Li, Ang

Towards Robust, Interpretable and Scalable Visual Representations

dc.contributor.advisor	Davis, Larry S	en_US
dc.contributor.author	Li, Ang	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2017-09-14T05:43:51Z
dc.date.available	2017-09-14T05:43:51Z
dc.date.issued	2017	en_US
dc.description.abstract	Visual representation is one of the central problems in computer vision. The essential problem is to develop a unified representation that effectively encodes both visual appearance and spatial information so that it can be easily applied to various vision applications such as face recognition, image matching, and multimodal image retrieval. Along with the history of computer vision research, there are four major levels of visual representations, i.e., geometric, low-level, mid-level and high-level. The dissertation comprises four works studying effective visual representations in the four different levels. Multiple approaches are proposed with the aim of improving the robustness, interpretability, and scalability of visual representations. Geometric features are effective in matching images under spatial transformations however their performance is sensitive to the noises. In the first part, we propose to model the uncertainty of geometric representation based on line segments and propose to equip these features with uncertainty modeling so that they could be robustly applied in the image-based geolocation application. We study in the second part the robustness of feature encoding to noisy keypoints. We show that traditional feature encoding is sensitive to background or noisy features. We propose the Selective Encoding framework which learns the relevance distribution of each codeword and incorporate such information with the original codebook model. Our approach is more robust to the localization errors or uncertainty in the active face authentication application. The mission of visual understanding is to express and describe the image content which is essentially relating images to human language. That typically involves finding a common representation inferable from both domains of data. In the third part, we propose a framework to extract a mid-level spatial representation directly from language descriptions and match such spatial layouts to the detected object bounding boxes for retrieving indoor scene images from user text queries. Modern high-level visual features are typically learned from supervised datasets, whose scalability is largely limited by the requirement of dedicated human annotation. In the last part, we propose to learn visual representations from large-scale weakly supervised data for a large number of natural language-based concepts, i.e., n-gram phrases. We propose the differentiable Jelinek-Mercer smoothing loss and train a deep convolutional neural network from images with associated user comments. We show that the learned model can predict a large number of phrase-based concepts from images, can be effectively applied to image-caption applications and transfers well to other visual recognition datasets.	en_US
dc.identifier	https://doi.org/10.13016/M2GH9B936
dc.identifier.uri	http://hdl.handle.net/1903/19974
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Computer engineering	en_US
dc.subject.pqcontrolled	Information technology	en_US
dc.subject.pquncontrolled	Feature encoding	en_US
dc.subject.pquncontrolled	Generative modeling	en_US
dc.subject.pquncontrolled	Geometric matching	en_US
dc.subject.pquncontrolled	Image geolocation	en_US
dc.subject.pquncontrolled	Language and vision	en_US
dc.subject.pquncontrolled	Uncertainty modeling	en_US
dc.title	Towards Robust, Interpretable and Scalable Visual Representations	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Li_umd_0117E_18362.pdf
Size:: 40.04 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations