Learning from Less Data: Perception and Synthesis

Kothandaraman, Divya

Learning from Less Data: Perception and Synthesis

Files

Kothandaraman_umd_0117E_24856.pdf (139.29 MB)

No. of downloads: 29

Date

2024

Authors

Kothandaraman, Divya

Advisor

Manocha, Dinesh

DRUM DOI

https://doi.org/10.13016/vyrn-0uv7

Abstract

Machine learning techniques have transformed various fields, particularly in computer vision. However, they typically require vast amounts of labeled data for training, which can be costly and impractical. This dependency on data highlights the importance of research into data efficiency. In this thesis, we present advancements in data-efficient deep learning within the contexts of visual perception and visual generation tasks.

In the first half of the thesis, we focus on data efficiency in visual perception. Specifically, we tackle the challenge of semantic segmentation in autonomous driving, assuming limited access to labeled data in both the target and related domains. we propose self-supervised learning solutions to enhance segmentation performance in unstructured and adverse weather conditions, ultimately extending to a more generalized approach that is on par with methods using immense amounts of labeled data, achieving up to 30% improvements over prior work. Next, we address data efficiency for autonomous aerial vehicles, specifically in video action recognition. Here, we integrate concepts from signal processing into neural networks, achieving both data and computational efficiency. Additionally, we propose differentiable learning methods for these representations, resulting in 8-38% improvements over previous work.

In the second half of the thesis, we will delve into data efficiency in visual generation. First, we focus on efficient generation of aerial-view images, utilizing pretrained models to create aerial perspectives from input scenes in a zero-shot manner. By incorporating techniques from classical computer vision and information theory, our work enables the generation of aerial images from complex, real-world inputs without requiring any 3D or paired data during training or testing. The approaches are on par with concurrent methods that use vast amounts of 3D data for training. Next, we focus on zero-shot personalized image and video generation, aiming to create content based on custom concepts. we propose methods that leverage prompting to generate images and videos at the intersection of various manifolds corresponding to these concepts and pretrained models, with applications in subject-driven action transfer and multi-concept video customization. These solutions are among the first in this area, showing significant improvements over baselines and related work. These approaches are also data and compute efficient, relying solely on pretrained models without the need for additional training data. Finally, we introduce a fundamental prompting solution inspired by techniques from finance and economics, demonstrating how insights from different fields can effectively address similar mathematical challenges.

URI (handle)

http://hdl.handle.net/1903/34026

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations

Full item page