Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 13

FOUNDATIONS OF TRUSTWORTHY DEEP LEARNING: FAIRNESS, ROBUSTNESS, AND EXPLAINABILITY
(2024) Nanda, Vedant; Dickerson, John; Gummadi, Krishna; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Deep Learning (DL) models, especially with the rise of the so-called foundation models, are increasingly used in real-world applications either as autonomous systems (\eg~facial recognition), as decision aids (\eg~medical imaging, writing assistants), and even to generate novel content (\eg~chatbots, image generators). This naturally results in concerns about the trustworthiness of these systems, for example, do the models systematically perform worse for certain subgroups? Are the outputs of these models reliable under perturbations to the inputs? This thesis aims to strengthen the foundations of DL models, so they can be trusted in deployment. I will cover three important aspects of trust: fairness, robustness, and explainability. I will argue that we need to expand the scope of each of these aspects when applying them to DL models and carefully consider possible tradeoffs between these desirable but sometimes conflicting notions of trust. Traditionally the fairness community has worked on mitigating biases in classical models such as Support Vector Machines (SVMs) and logistic regression. However, a lot of real-world applications where bias shows up in a myriad of ways involve much more complicated DL models. In the first part, I will present two works that show how thinking about fairness for deep learning (DL) introduces new challenges, especially due to their overparametrized nature and susceptibility to adversarial attacks. Robustness literature has focused largely on measuring the invariance of models to carefully constructed (adversarial attacks) or natural (distribution shifts) noise. In the second part, I will argue that to get truly robust models, we must focus on a more general notion of robustness: measuring the alignment of invariances of DL models with other models of perception such as humans. I will present two works that measure shared invariances between (1) DL models and humans, and (2) between DL models. Such measurements of robustness provide a measure of \textit{relative robustness}, through which we can better understand the failure modes of DL models and work towards building truly robust systems. Finally, in the third part, I will show how even a small subset of randomly chosen neurons from a pre-trained representation can transfer very well to downstream tasks. We call this phenomenon \textit{diffused redundancy}, which we observe in a variety of pre-trained representations. This finding challenges existing beliefs in the explainability literature that claim individual neurons learn disjoint semantically meaningful concepts.
IMPROVING MODEL AND DATA EFFICIENCY FOR DEEP LEARNING
(2023) Ni, Renkun; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Deep learning has achieved or even surpassed human-level performance in a wide range of challenging tasks encompassing computer vision, natural language processing, and speech recognition. Nevertheless, such achievements are predominantly derived from training huge models (i.e., billions of parameters) on numerous labeled examples, which requires considerable computation resources and expensive data collection costs. Various studies have strived to enhance efficiency in these domains. In terms of model efficiency, remarkable advancements have been made to accelerate the training and inference by methods such as quantization and pruning. Regarding data efficiency, few-shot learning, semi-supervised learning, and self-supervised learning have gathered more attention due to their abilities to learn feature representations with few labeled examples or even without human supervision. This dissertation introduces several improvements and provides an in-depth analysis of these methodologies, aiming to address the computational challenges and augment the efficiency of deep learning models, especially in computer vision. In addressing model efficiency, we explore the potential for improvement in both the training and inference phases of deep learning processes. For model inference acceleration, we investigate the challenges of using extremely low-resolution arithmetic in quantization methods, where integer overflows frequently happen and the models are sensitive to these overflows. To address this issue, we introduce a novel module, designed to emulate the “wrap-around” property of integer overflow, which maintains comparable performance with 8-bit low-resolution accumulators. In addition, to scale inferences of Vision Transformers on mobile devices, we propose an efficient and flexible local self-attention mechanism optimized directly on mobile devices that achieves comparable performance to global attention while significantly reducing the on-device latency, especially for high-resolution tasks. Besides the computational costs, training deep neural networks consumes a large amount of memory which is another bottleneck to applying model training on edge devices. To improve the memory efficiency of training deep networks on resource-limited devices, we propose a quantization aware training framework for federated learning where only the quantized model is distributed and trained on the client devices. In the realm of label efficiency, we first develop a better understanding of the models trained by meta-learning, which has a unique training pipeline, for few-shot classification tasks. In addition, a comprehensive analysis has been conducted to integrate data augmentation strategies into the meta-learning pipeline, leading to Meta-MaxUp, a novel data augmentation technique for meta-learning, demonstrating enhanced few-shot performance across various benchmarks. Beyond few-shot learning, the research explores the application of meta-learning methods in the context of self-supervised learning. We discuss the close relationship between meta-learning and contrastive learning, a method that achieves excellent results in self-supervised learning, under a certain task distribution.
DEEP LEARNING APPLICATIONS IN BONE MINERAL DENSITY ESTIMATION, SPINE VERTEBRA DETECTION, AND LIVER TUMOR SEGMENTATION
(2023) Wang, Fakai; Wu, Min; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
As the aging population and related health concerns emerge in more countries than ever, we face many challenges such as the availability, quality, and cost of medical resources. Thanks to the development of machine learning and computer vision in recent years, Deep Learning (DL) can help solve some medical problems. The diagnosis of various diseases (such as spine disorders, low bone mineral density, and liver cancer) relies on X-rays or Computed Tomography (CT). DL models could automatically analyze these radiography scans and help with the diagnosis. Different organs and diseases have distinct characteristics, requiring customized algorithms and models. In this dissertation, we investigate several Computer Aided-Diagnosis (CAD) tasks and present corresponding DL solutions. Deep Learning has multiple advantages. Firstly, DL models could uncover underlying health issues invisible to humans. One example is the opportunistic screening of Osteoporosis through chest X-ray. We develop DL models, utilizing chest film to predict bone mineral density, which helps prevent bone fractures. Humans could not tell anything about bone density in the chest film, but DL models could reliably make the prediction. The second advantage is accuracy and efficiency. Reading radiography is tedious, requiring years of expertise. This is particularly true when a radiologist needs to localize potential liver tumors by looking through tens of CT slices, spending several minutes. Deep learning models could localize and identify the tumors within seconds, greatly reducing human labor. Experiments show DL models can pick up small tumors, which are hardly seen by the naked eye. Attention should be paid to deep learning limitations. Firstly, DL models lack explainability. Deep learning models store diagnostic knowledge and statistical patterns in their parameters, which are obscure to humans. Secondly, uncertainty exists for rare diseases. If not exposed to rare cases, the models would yield uncertain outcomes. Thirdly, training AI models are subject to high-quality data but the labeling quality varies in clinical practice. Despite the challenges and issues, deep learning models are promising to promote medical diagnosis in society.
ROBUSTNESS AND UNDERSTANDABILITY OF DEEP MODELS
(2022) Ghiasi, Mohammad Amin; Goldstein, Thomas; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Deep learning has made a considerable leap in the past few decades, from promising models for solving various problems to becoming state-of-the-art. However, unlike classical machine learning models, it is sometimes difficult to explain why and how deep learning models make decisions. It is also interesting that their performance can drop with small amounts of noise. In short, deep learning models are well-performing, easily corrupted, hard-to-understand models that beat human beings in many tasks. Consequently, improving these deep models requires a deep understanding. While deep learning models usually generalize well on unseen data, adding negligible amounts of noise to their input can flip their decision. This interesting phenomenon is known as "adversarial attacks." In this thesis, we study several defense methods against such adversarial attacks. More specifically, we focus on defense methods that, unlike traditional methods, use less computation or fewer training examples. We also show that despite the improvements in adversarial defenses, even provable certified defenses can be broken. Moreover, we revisit regularization to improve adversarial robustness. Over the past years, many techniques have been developed for understanding and explaining how deep neural networks make a decision. This thesis introduces a new method for studying the building blocks of neural networks' decisions. First, we introduce the Plug-In Inversion, a new method for inverting and visualizing deep neural network architectures, including Vision Transformers. Then we study the features a ViT learns to make a decision. We compare these features when the network trains on labeled data versus when it uses a language model's supervision for training, such as in CLIP. Last, we introduce feature sonification, which borrows feature visualization techniques to study models trained for speech recognition (non-vision) tasks.
Spectral Methods for Neural Network Designs
(2022) Su, Jiahao; Huang, Furong; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Neural networks are general-purpose function approximators. Given a problem, engineers or scientists select a hypothesis space of functions with specific properties by designing the network architecture. However, mainstream designs are often ad-hoc, which could suffer from numerous undesired properties. Most prominently, the network architectures are gigantic, where most parameters are redundant while consuming computational resources. Furthermore, the learned networks are sensitive to adversarial perturbation and tend to underestimate the predictive uncertainty. We aim to understand and address these problems using spectral methods --- while these undesired properties are hard to interpret from network parameters in the original domain, we could establish their relationship when we represent the parameters in a spectral domain. These relationships allow us to design networks with certified properties via the spectral representation of parameters.
Enhancing Visual and Gestural Fidelity for Effective Virtual Environments
(2020) Meng, Xiaoxu; Varshney, Amitabh; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
A challenge for the virtual reality (VR) industry is facing is that VR is not immersive enough to make people feel a genuine sense of presence: the low frame rate leads to dizziness and the lack of human body visualization limits the human-computer interaction. In this dissertation, I present our research on enhancing visual and gestural fidelity in the virtual environment. First, I present a new foveated rendering technique: Kernel Foveated Rendering (KFR), which parameterizes foveated rendering by embedding polynomial kernel functions in log-polar space. This GPU-driven technique uses parameterized foveation that mimics the distribution of photoreceptors in the human retina. I present a two-pass kernel foveated rendering pipeline that maps well onto modern GPUs. I have carried out user studies to empirically identify the KFR parameters and have observed a 2.8x-3.2x speedup in rendering on 4K displays. Second, I explore the rendering acceleration through foveation for 4D light fields, which captures both the spatial and angular rays, thus enabling free-viewpoint rendering and custom selection of the focal plane. I optimize the KFR algorithm by adjusting the weight of each slice in the light field, so that it automatically selects the optimal foveation parameters for different images according to the gaze position. I have validated our approach on the rendering of light fields by carrying out both quantitative experiments and user studies. Our method achieves speedups of 3.47x-7.28x for different levels of foveation and different rendering resolutions. Thirdly, I present a simple yet effective technique for further reducing the cost of foveated rendering by leveraging ocular dominance - the tendency of the human visual system to prefer scene perception from one eye over the other. Our new approach, eye-dominance-guided foveated rendering (EFR), renders the scene at a lower foveation level (with higher detail) for the dominant eye than the non-dominant eye. Compared with traditional foveated rendering, EFR can be expected to provide superior rendering performance while preserving the same level of perceived visual quality. Finally, I present an approach to use an end-to-end convolutional neural network, which consists of a concatenation of an encoder and a decoder, to reconstruct a 3D model of a human hand from a single RGB image. Previous research work on hand mesh reconstruction suffers from the lack of training data. To train networks with full supervision, we fit a parametric hand model to 3D annotations, and we train the networks with the RGB image with the fitted parametric model as the supervision. Our approach leads to significantly improved quality compared to state-of-the-art hand mesh reconstruction techniques.
COST-EFFECTIVE PROGNOSTICS AND HEALTH MONITORING OF LOCALLY DAMAGED PIPELINES WITH HIGH CONFIDENCE LEVEL
(2020) Aria, Amin; Modarres, Mohammad; Azarm, Shapour; Mechanical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Localized pipeline damages, caused by degradation processes such as corrosion, are prominent, can result in pipeline failure and are expensive to monitor. To prevent pipeline failure, many Prognostics and Health Monitoring (PHM) approaches have been developed in which sensor network for online, and human inspection for offline data gathering are separately used. In this dissertation, a two-level (segment- and integrated-level) PHM approach for locally damaged pipelines is proposed where both of these degradation data gathering schemes (i.e., detection methods) are considered simultaneously. The segment-level approach, in which the damage behavior is considered to be uniform, consists of a static and a dynamic phase. In the static phase, a new optimization problem for the health monitoring layout design of locally damaged pipelines is formulated. The solution to this problem is an optimal configuration (or layout) of degradation detection methods with a minimized health monitoring cost and a maximized likelihood of damage detection. In the dynamic phase, considering the optimal layout, an online fusion of high-frequency sensors data and low-frequency inspection information is conducted to estimate and then update the pipeline’s Remaining Useful Life (RUL) estimate. Subsequently, the segment-level optimization formulation is modified to improve its scalability and facilitate updating layouts considering the online RUL estimates. Finally, at the integrated-level, the modified segment-level approach is used along with Stochastic Dynamic Programming (SDP) to produce an optimal set of layouts for a long pipeline consisting of multiple segments with different damage behavior. Experimental data and several notional examples are used to demonstrate the performance of the proposed approaches. Synthetically generated damage data are used in two examples to demonstrate that the proposed segment-level layout optimization approach results in a more robust solution compared to single detection approaches and deterministic methods. For the dynamic segment-level phase, acoustic emission sensor signals and microscopic images from a set of fatigue crack experiments are considered to show that combining sensor- and image-based damage size estimates leads to accuracy improvements in RUL estimation. Lastly, using synthetically generated damage data for three hypothetical pipeline segments, it is shown that the constructed integrated-level approach provides an optimal set of layouts for several pipeline segments.
Adversarial Robustness and Robust Meta-Learning for Neural Networks
(2020) Goldblum, Micah; Czaja, Wojciech; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Despite the overwhelming success of neural networks for pattern recognition, these models behave categorically different from humans. Adversarial examples, small perturbations which are often undetectable to the human eye, easily fool neural networks, demonstrating that neural networks lack the robustness of human classifiers. This thesis comprises a sequence of three parts. First, we motivate the study of defense against adversarial examples with a case study on algorithmic trading in which robustness may be critical for security reasons. Second, we develop methods for hardening neural networks against an adversary, especially in the low-data regime, where meta-learning methods achieve state-of-the-art results. Finally, we discuss several properties of the neural network models we use. These properties are of interest beyond robustness to adversarial examples, and they extend to the broad setting of deep learning.
DATA-DRIVEN STUDIES OF TRANSIENT EVENTS AND APERIODIC MOTIONS
(2019) Wang, Rui; Balachandran, Balakumar; Mechanical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The era of big data, high-performance computing, and machine learning has witnessed a paradigm shift from physics-based modeling to data-driven modeling across many scientific fields. In this dissertation work, transient events and aperiodic motions of complex nonlinear dynamical system are studied with the aid of a data- driven modeling approach. The goal of the work has been to further the ability for future behavior prediction, state estimation, and control of related behaviors. It is shown that data on extreme waves can be used to carry out stability analysis and ascertain the nature of the transient phenomenon. In addition, it is demonstrated that a low number of soliton elements can be used to realize a rogue wave on the basis of nonlinear interactions amongst the basic elements. The pro- posed nonlinear phase interference model provides an appealing explanation for the formation of ocean extreme wave and related statistics, and a superior reconstruction of the Draupner wave event than that obtained on the basis of linear superposition. Chaotic data, another manifestation of aperiodic motions, which are obtained from prototypical ordinary differential and partial differential systems are considered and a neural machine is realized to predict the corresponding responses based on a limited training set as well to forecast the system behavior. A specific neural architecture, called the inhibitor mechanism, has been designed to enable chaotic time series forecasting. Without this mechanism, even the short-term predictions would be intractable. Both autonomous and non-autonomous dynamical systems have been studied to demonstrate the long-term forecasting possibilities with the de- veloped neural machine. For each dynamical system considered in this dissertation, a long forecasting horizon is achieved with a short historical data set. Furthermore, with the developed neural machine, one can relax the requirement of continuous historical data measurements, thus, providing for a more pragmatic approach than the previous approaches available in the literature. It is expected that the efforts of this dissertation work will lead to a better understanding of the underlying mechanism of transient and aperiodic events in complex systems and useful techniques for forecasting their future occurrences.
Modeling Deep Context in Spatial and Temporal Domain
(2018) Dai, Xiyang; Davis, Larry S.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Context has been one of the most important aspects in computer vision researches because it provides useful guidance to solve variant tasks in both spatial and temporal domain. As the recent rise of deep learning methods, deep networks have shown impressive performances on many computer vision tasks. Model deep context explicitly and implicitly in deep networks can further boost the effectiveness and efficiency of deep models. In spatial domain, implicitly model context can be useful to learn discriminative texture representations. We present an effective deep fusion architecture to capture both the second order and first older statistics of texture features; Meanwhile, explicitly model context can also be important to challenging task such as fine-grained classification. We then present a deep multi-task network that explicitly captures geometry constraints by simultaneously conducting fine-grained classification and key-point localization. In temporal domain, explicitly model context can be crucial to activity recognition and localization. We present a temporal context network to explicitly capture relative context around a proposal, which samples two temporal scales pair-wisely for precise temporal localization of human activities; Meanwhile, implicitly model context can lead to better network architecture for video applications. We then present a temporal aggregation network that learns a deep hierarchical representation for capturing temporal consistency. Finally, we conduct research on jointly modeling context in both spatial and temporal domain for human action understanding, which requires to predict where, when and what a human action happens in a crowd scene. We present a decoupled framework that has dedicated branches for spatial localization and temporal recognition. Contexts in spatial and temporal branches are modeled explicitly and fused together later to generate final predictions.

Theses and Dissertations from UMD

Browse

Filters

Settings

Sort By

Results per page

Search Results