Learning from Less Data: Perception and Synthesis
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Machine learning techniques have transformed various fields, particularly in computer vision. However, they typically require vast amounts of labeled data for training, which can be costly and impractical. This dependency on data highlights the importance of research into data efficiency. In this thesis, we present advancements in data-efficient deep learning within the contexts of visual perception and visual generation tasks.
In the first half of the thesis, we focus on data efficiency in visual perception. Specifically, we tackle the challenge of semantic segmentation in autonomous driving, assuming limited access to labeled data in both the target and related domains. we propose self-supervised learning solutions to enhance segmentation performance in unstructured and adverse weather conditions, ultimately extending to a more generalized approach that is on par with methods using immense amounts of labeled data, achieving up to 30% improvements over prior work. Next, we address data efficiency for autonomous aerial vehicles, specifically in video action recognition. Here, we integrate concepts from signal processing into neural networks, achieving both data and computational efficiency. Additionally, we propose differentiable learning methods for these representations, resulting in 8-38% improvements over previous work.
In the second half of the thesis, we will delve into data efficiency in visual generation. First, we focus on efficient generation of aerial-view images, utilizing pretrained models to create aerial perspectives from input scenes in a zero-shot manner. By incorporating techniques from classical computer vision and information theory, our work enables the generation of aerial images from complex, real-world inputs without requiring any 3D or paired data during training or testing. The approaches are on par with concurrent methods that use vast amounts of 3D data for training. Next, we focus on zero-shot personalized image and video generation, aiming to create content based on custom concepts. we propose methods that leverage prompting to generate images and videos at the intersection of various manifolds corresponding to these concepts and pretrained models, with applications in subject-driven action transfer and multi-concept video customization. These solutions are among the first in this area, showing significant improvements over baselines and related work. These approaches are also data and compute efficient, relying solely on pretrained models without the need for additional training data. Finally, we introduce a fundamental prompting solution inspired by techniques from finance and economics, demonstrating how insights from different fields can effectively address similar mathematical challenges.