Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Human-Centric Deep Generative Models: The Blessing and The Curse
    (2021) Yu, Ning; Davis, Larry; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models. For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction.
  • Thumbnail Image
    Item
    Towards Robust, Interpretable and Scalable Visual Representations
    (2017) Li, Ang; Davis, Larry S; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Visual representation is one of the central problems in computer vision. The essential problem is to develop a unified representation that effectively encodes both visual appearance and spatial information so that it can be easily applied to various vision applications such as face recognition, image matching, and multimodal image retrieval. Along with the history of computer vision research, there are four major levels of visual representations, i.e., geometric, low-level, mid-level and high-level. The dissertation comprises four works studying effective visual representations in the four different levels. Multiple approaches are proposed with the aim of improving the robustness, interpretability, and scalability of visual representations. Geometric features are effective in matching images under spatial transformations however their performance is sensitive to the noises. In the first part, we propose to model the uncertainty of geometric representation based on line segments and propose to equip these features with uncertainty modeling so that they could be robustly applied in the image-based geolocation application. We study in the second part the robustness of feature encoding to noisy keypoints. We show that traditional feature encoding is sensitive to background or noisy features. We propose the Selective Encoding framework which learns the relevance distribution of each codeword and incorporate such information with the original codebook model. Our approach is more robust to the localization errors or uncertainty in the active face authentication application. The mission of visual understanding is to express and describe the image content which is essentially relating images to human language. That typically involves finding a common representation inferable from both domains of data. In the third part, we propose a framework to extract a mid-level spatial representation directly from language descriptions and match such spatial layouts to the detected object bounding boxes for retrieving indoor scene images from user text queries. Modern high-level visual features are typically learned from supervised datasets, whose scalability is largely limited by the requirement of dedicated human annotation. In the last part, we propose to learn visual representations from large-scale weakly supervised data for a large number of natural language-based concepts, i.e., n-gram phrases. We propose the differentiable Jelinek-Mercer smoothing loss and train a deep convolutional neural network from images with associated user comments. We show that the learned model can predict a large number of phrase-based concepts from images, can be effectively applied to image-caption applications and transfers well to other visual recognition datasets.