UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 4 of 4
  • Thumbnail Image
    Item
    Analyzing Inverse Design Problems from a Topological Perspective
    (2024) Chen, Qiuyi; Fuge, Mark; Mechanical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Inverse design (ID) problems are inverse problems that aim to rapidly retrieve the subset of valid designs having the desired performances and properties under the given conditions. In practice, this can be solved by training generative models to approximate and sample the posterior distributions of designs. However, little has been done to understand their mechanisms and limitations from a theoretical perspective. This dissertation leverages theoretical tools from general and differential topology to answer these three questions of inverse design: what does a set of valid designs look like? How helpful are the data-driven generative models for retrieving the desired designs from this set? What topological properties affect the subset of desired designs? The dissertation proceeds by dismantling inverse (design) problems into two major subjects: that is, the representing and probing of a given set of valid designs (or data), and the retrieval of the desired designs (or data) from this given set. It draws inspiration from topology and geometry to investigate them and makes the main contributions below: 1. Chapter 3 details a novel representation learning method called Least Volume, which has properties similar to nonlinear PCA for representing datasets. It can minimize the representation's dimension automatically and, as shown in Chapter 4, conducts contrastive learning when applied to labeled datasets. 2. Two conditional generative models are developed to generate performant 2-D airfoils and 3-D heat sinks in Chapter 5 and 6 respectively. They can produce realistic designs to warm-start further optimization, with the relevant chapters detailing their acceleration effects. 3. Lastly, Chapter 7 describes how to use Least volume to solve high-dimensional inverse problems efficiently. Specifically, using examples from physic system identification, the chapter uncovers the correlation between the inverse problem's uncertainty and its intrinsic dimensions.
  • Item
    Interpreting Visual Representations and Mitigating their Failures
    (2024) Kalibhat, Neha; Feizi, Soheil; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deep learning has become the cornerstone of artificial intelligence (AI), particularly in language and computer vision domains. The progression in this field is reflected in numerous applications accessible to the general public, such as information retrieval via virtual assistants, content generation, autonomous vehicles, drug discovery, and medical imaging. This unprecedented rate of AI adoption raises the critical need for research on the fundamental underpinnings of deep neural networks to understand what leads to their decisions and why they fail. This thesis concentrates on self-supervised representation learning, a prevalent unsupervised method employed by foundational models to extract patterns from extensive visual data. Specifically, our focus lies in examining the low-dimensional representations generated by these models and dissecting their failure modes. In our initial investigation, we discover that self-supervised representations lack robustness to domain shifts, as they are not explicitly trained to distinguish image content from its domain. We remedy this issue by proposing a module that can be plugged into existing self-supervised baselines to disentangle their representation spaces and promote domain invariance and generalization. Our subsequent analysis delves into the patterns within representations that influence downstream classification. We scrutinize the discriminative capacity of individual features and their activations. We then propose an unsupervised quality metric that can preemptively determine whether a given representation will be correctly or incorrectly classified, with high precision. In the next segment of this thesis, we leverage our findings to further demystify the representation space, by uncovering interpretable subspaces which have unique concepts associated with them. We design a novel explainability framework that uses a vision-language model (such as CLIP) to provide natural language explanations for neural features (or groups) of a given pre-trained model. We next investigate the role of augmentations and format transformations in learning generalizable visual representations. Drawing inspiration from advancements in audio and speech modalities, we examine how presenting visual data in multiple formats affects learning, separating this from the impact of augmentations. In the final segment, we reveal compositionality as a notable failure mode in current state-of-the-art representation methods. We critique the use of fixed-size patches in vision transformers and demonstrate the benefits of employing semantically meaningful patches based on visual priors. This design adjustment leads to significant improvements in image-text retrieval tasks and, more importantly, enhances performance on compositionality benchmarks.
  • Thumbnail Image
    Item
    SELF SUPERVISED LEARNING ON LARGE SCALE DATASETS
    (2023) Mishra, Shlok Kumar; Jacobs, Davis; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Humans and animals possess the remarkable ability to comprehend and perceive the world around them with minimal, if any, reliance on explicit labels. Much of the knowledge acquired by humans is obtained without the need for direct supervision, simply by processing extensive amounts of unlabeled data. This observation strongly suggests that enabling machines to grasp the world without the use of labels could represent a fundamental approach to artificial intelligence.However, the vast majority of advancements achieved by state-of-the-art deep neural networks have been fueled by their dependence on annotated datasets. The process of annotating datasets is both costly and impractical for numerous domains. This manuscript discusses various ways machines can be taught without using any labels using Self-Supervised Learning (SSL). We show that generally training machines without using any labels can result in less biased and more robust representations. This manuscript deals with mainly four types of issues in SSL. The first problem we tackle is the over-emphasis of neural networks on low level shortcuts such as texture. Consider the example of a sofa with texture of a leopard. State-of-the-art neural networks will often predict this sofa to be a leopard, instead of a sofa. Unlike humans, neural networks don't understand the shape of objects and often rely on low level cues. To solve this we propose two different methods. To reduce reliance on texture cues we firstly propose to suppress texture in images, which helps the neural networks to focus less on texture and more on higher level information such as shape. Secondly we augment the SSL learning methods with negative samples which contain only texture from the images. By augmenting with texture based images our method achieves better generalization, especially in out-of-domain settings. The second problem we deal with is the poor performance of SSL methods on multi-object datasets like OpenImages. One of the fundamental reasons behind this is the cropping data augmentations that select sub-regions of an image to be used as positive samples. These positive samples are generally very meaningful since in object centric datasets they often contain semantic overlap between the views. However this doesn't hold for multi-object datasets since there could be multiple objects and the two views might not have semantic overlap. To remedy this we propose replacing one or both of the random crops with crops obtained from an object proposal algorithm. This encourages the network to learn more object aware representations that result in significant improvement over the random crop baselines. Thirdly, current SSL networks generally treat objects and scenes using the same framework. However visually similar objects are close in the representation space, hence we argue that scenes and objects should follow a hierarchical structure based on their compositionality. To solve this, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. Our hyperbolic loss encourages the network to have a scene-object hypernymy by optimizing the magnitude of their norms. Lastly, we address the challenge of training self-supervised learning (SSL) methods on vast real-world datasets like JFT. Currently, state-of-the-art SSL meth- ods struggle to perform effectively on JFT due to its skewed data distribution. To address this issue, we present a novel approach that combines Masked Autoen- coders and contrastive learning. We introduce CAN, which is a concise and con- ceptually clear fusion of three components: (C) contrastive learning, (A) masked autoencoders, and (N) the noise prediction approach commonly used in diffusion models. These learning mechanisms complement each other in the following ways: contrastive learning shapes the embedding space when processing a batch of im- ages; masked autoencoders focus on reconstructing low-frequency spatial correla- tions within a single image; and noise prediction is employed to reconstruct high- frequency image components. When combined, our approach surpasses the perfor- mance of its individual constituents, MAE and SimCLR, across a wide range of downstream transfer learning and robustness tasks.
  • Thumbnail Image
    Item
    Towards Reliable and Efficient Representation Learning
    (2022) Zhu, Chen; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Large-scale representation learning has achieved enormous success during the past decade, surpassing human-level accuracy on a range of benchmarks including image recognition and language understanding. The success is supported by advances in both the algorithms and computing capabilities, which enables training large models on enormous amounts of data. While the performance continues to improve on existing benchmarks with larger model and training dataset sizes, the reliability and efficiency of large models are often questioned for deployment in practice. Uncensored datasets can have been poisoned to manipulate model behavior, while practical deployment requires models to be trained or updated quickly on the latest data, and to have low latency for inference. This dissertation studies how to improve the reliability and efficiency of representation learning. On reliability, we study the threats of data poisoning and evasion attacks and how to defend against these threats. We propose a more vicious targeted clean-label poisoning attack that is highly effective even when the target architecture is unknown.To defend against such threats, we develop a k-NN based method in the feature space to filter out the poison examples from the training set, which effectively reduces the success rate of poisoning attacks at an insignificant cost of accuracy. For evasion attack, we demonstrate a new threat model against transfer learning, where the attack can be successful without knowledge of the specific classification head. In a broader sense, we also propose methods to enhance the empirical and certified robustness against evasion attacks. For efficiency, our study focuses on three dimensions: data efficiency, convergence speed and computational complexity.For data efficiency, we propose enhanced adversarial training algorithms as a general data augmentation technique to improve the generalization of models given the same amount of labeled data, where we show its efficacy for Transformer models on a range of language understanding tasks. For convergence speed, we propose an automated initialization scheme to accelerate the convergence of convolutional networks for image recognition and Transformers for machine translation. For computational complexity, to scale Transformers to long sequences, we propose a linear-complexity attention mechanism, which improves the efficiency while preserving the performance of full attention on a range of language and vision tasks.