Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 15

DEEP LEARNING APPROACHES FOR ESTIMATING AND FORECASTING SURFACE DOWNWARD SHORTWAVE RADIATION FROM SATELLITE DATA
(2024) Li, Ruohan; Wang, Dongdong; Geography; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Surface downward shortwave radiation (DSR) designates solar radiation with a wavelength from 300 to 4000 nm received at the Earth’s surface. DSR plays a pivotal role in the surface energy and radiation budget, serving as the primary driver for hydrological, ecological, and biogeochemical cycles (Liang et al., 2010, 2019), and the important input for various earth models (Huang et al., 2019; Liang et al., 2010; Stephens et al., 2012). Given the rising demand for renewable energy, as well as accelerated advancements in solar energy technologies on both utility-scale and residential scale, the precision and resolution in estimating and forecasting DSR have become indispensable for planning and administering solar power plants (Gueymard, 2014; Jiang et al., 2019). This dissertation delves into the potential of integrating deep learning with satellite observations to address the deficiencies in current DSR estimation and forecasting methods, aiming to cater to the evolving needs of solar radiation estimation. The research begins by examining current DSR satellite products, emphasizing their limitations, particularly concerning spatial resolution and performance in snowy, cloudy, and high-latitude areas. In such regions, challenges arise from the degradation of radiative transfer models, band saturation, the pronounced effects of 3D cloud dynamics, and temporal resolution constraints (Li et al., 2021). Identifying these gaps, the study introduces the concept of transfer learning to tackle cases where physical methods degrade and limited training data is available. By combining data from physical simulations and ground observations, the proposed models enhance both the accuracy and adaptability of DSR predictions on a global scale. The investigation further reveals the influence of training data volume on model performance, illustrating how transfer learning can ameliorate these effects (Li et al., 2022). Moreover, the dissertation compares the application of DenseNet, Gated Recurrent Unit (GRU), and a hybrid of Convolutional Neural Network (CNN) and GRU (CNNGRU) to geostationary satellite data, achieving precise and timely DSR estimates. These models underscore their prowess in tackling 3D cloud effects and reducing dependency on additional data sources by the spatial and temporal structure of DL (Li et al., 2023b). Finally, the dissertation introduces the SolarFormer, a space-time transformer neural network adept at forecasting solar radiation up to three hours in advance at 15-minute intervals. By harnessing solely geostationary satellite imagery without the need for ground measurements, this model facilitates expansive DSR predictions, which are crucial for optimizing solar energy distribution at both utility and micro scales. This chapter also highlights the Transformer model's potential for extended forecasting due to its computational and memory efficiency.
The Limitations of Deep Learning Methods in Realistic Adversarial Settings
(2023) Kaya, Yigitcan; Dumitras, Tudor; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The study of adversarial examples has evolved from a niche phenomenon to a well-established branch of machine learning (ML). In the conventional view of an adversarial attack, the adversary takes an input sample, e.g., an image of a dog, and applies a deliberate transformation to this input, e.g., a rotation. This then causes the victim model to abruptly change its prediction, e.g., the rotated image is classified as a cat. Most prior work has adapted this view across different applications and provided powerful attack algorithms as well as defensive strategies to improve robustness. The progress in this domain has been influential for both research and practice and it has produced a perception of better security. Yet, security literature tells us that adversaries often do not follow a specific threat model and adversarial pressure can exist in unprecedented ways. In this dissertation, I will start from the threats studied in security literature to highlight the limitations of the conventional view and extend it to capture realistic adversarial scenarios. First, I will discuss how adversaries can pursue goals other than hurting the predictive performance of the victim. In particular, an adversary can wield adversarial examples to perform denial-of-service against emerging ML systems that rely on input-adaptiveness for efficient predictions. Our attack algorithm, DeepSloth, can transform the inputs to offset the computational benefits of these systems. Moreover, an existing conventional defense is ineffective against DeepSloth and poses a trade-off between efficiency and security. Second, I will show how the conventional view leads to a false sense of security for anomalous input detection methods. These methods build modern statistical tools around deep neural networks and have shown to be successful in detecting conventional adversarial examples. As a general-purpose analogue of blending attacks in security literature, we introduce the Statistical Indistinguishability Attack (SIA). SIA bypasses a range of published detection methods by producing anomalous samples that are statistically similar to normal samples. Third, and finally, I will focus on malware detection with ML, a domain where adversaries gain leverage over ML naturally without deliberately perturbing inputs like in the conventional view. Security vendors often rely on ML for automating malware detection due to the large volume of new malware. A standard approach for detection is collecting runtime behaviors of programs in controlled environments (sandboxes) and feeding them to an ML model. I have first observed that a model trained using this approach performs poorly when it is deployed on program behaviors from realistic, uncontrolled environments, which gives malware authors an advantage in causing harm. We attribute this deterioration to distribution shift and investigate possible improvements by adapting modern ML techniques, such as distributionally robust optimization. Overall, my dissertation work has reinforced the importance of considering comprehensive threat models and applications with well-documented adversaries for properly assessing the security risks of ML.
DEVELOPMENT OF ARTIFICIAL INTELLIGENCE AUGMENTED METAL-ORGANIC FRAMEWORK-BASED SYSTEMS AND THEIR APPLICATIONS IN FOOD SECTORS
(2022) ma, peihua; Wang, Qin Q. W.; Food Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Metal-organic frameworks (MOFs), a type of cutting-edge designable porous scaffolding materials attracted attention in reticular chemistry, which satisfied fundamental demands for delivery research in the past years. In this research, UiO-66 MOF family with different modifications was applied in the food delivery system and freshness monitoring.First, zirconium (IV) chloride and benzene-1,4-dicarboxylic acid were used to make the Zr-based MOF UiO-66. Then, using a post-synthesis loading process, curcumin was encapsulated in it. The system attained a high loading capacity of 3.45 percent w/w, according to both spectroscopic and thermogravimetric measurements. X-ray diffraction (XRD), physisorption analyzer, scanning electron microscopy (SEM), and energy-dispersive X-ray spectrometer (EDS) were used to characterize the crystal structure, porosity, and morphology of the curcumin delivery system, respectively. Curcumin was shown to be released in a controlled manner in simulated intestinal fluids using an in vitro digestion test. After 180 minutes of digestion, almost 60% of the curcumin was released. Second, two types of curcumin-loaded UiO-66 (representative high biocompatibility and water-stable metal-organic framework) deliver systems, curcumin-loaded UiO-66 Pickering emulsion and curcumin loaded UiO-66 high internal-phase Pickering emulsions (HIPPE) were prepared, named curcmin@UiO-66 PE and curcumin@UiO-66 HIPPE, respectively. The loading capacity for the two delivery systems was reached 7.33, and 26.18% w/w respectively. All systems were characterized using X-ray diffraction (XRD), physisorption analyzer, scanning electron microscopy (SEM), and energy-dispersive X-ray spectrometer (EDS), for crystallography, morphology, physicochemical properties, with computer assistant optimization with DFT and GCMC simulation for maximum loading capacity. The result showed that these systems both exhibited extremely high surface area and porosity, as well as strong chemical and thermal stability, which demonstrated their great potential for application as a food delivery system. On this basis, the emulsion system was further optimized using the response surface method. These novel MOF nanoparticle stabilized delivery systems could be practically utilized for other bioactive components and antimicrobial agents, which would find applications in functional food, food safety, and biomedical areas in the future. Third, incorporating or positioning multi-functional MOFs into the smart package is one of the next steps toward reticular chemistry for commercial application. Here, a cheap and versatile method to incorporate MOFs into smart food packages via generic patterning was developed. Meanwhile, deep convolutional neural networks (DCNN) were combined to form a system for monitoring food freshness that provided scent fingerprint recognition. The ice-template-based UiO-66-Br/chitosan sensor array and MOF-MMM-based UiO-66-OH/PVA sensor array comprising 6 different dyes absorbed at MOF matrix formed scent fingerprints that were identifiable by DCNN. Several state-of-art DCNN models were trained for shrimp freshness monitoring by using 31584 labeled images and 13537 images for testing. The highest accuracy achieved was up to 99.94% by the Wide-Slice Residual Network 50 (WISeR50). MOF-MMM-based sensor array showed a similar result where chicken freshness estimation achieved up to 98.95%. These platforms are intuitive, fast, accurate, and non-destructive, enabling consumers to monitor food freshness.
Deep-Learning Based Image Analysis on Resource-Constrained Systems
(2021) Lee, Eung Joo; Bhattacharyya, Shuvra S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
In recent years, deep learning has led to high-end performance on a very wide variety of computer vision tasks. Among different types of deep neural networks, convolutional neural networks (CNNs) are extensively studied and utilized for image analysis purposes, as CNNs have the capability to effectively capture spatial and temporal dependencies in images. The growth in the amount of annotated image data and improvements in graphics processing units are factors in the rapid gain in popularity of CNN-based image analysis systems. This growth in turn motivates investigation into the application of CNN-based deep learning to increasingly complex tasks, including an increasing variety applications at the network edge. The application of deep CNNs to novel edge applications involves two major challenges. First, in many of the emerging edge-based application areas, there is a lack of sufficient training data or an uneven class balance within the datasets. Second, stringent implementation constraints --- including constraints on real-time performance, memory requirements, and energy consumption --- must be satisfied to enable practical deployment. In this thesis, we address these challenges in developing deep-CNN-based image analysis systems for deployment on resource-constrained devices at the network edge. To tackle the challenges for medical image analysis, we first propose a methodology and tool for semi-automated training dataset generation in support of robust segmentation. The framework is developed to provide robust segmentation of surgical instruments using deep learning. We then address the problem of training dataset generation for real-time object tracking using a weakly supervised learning method. In particular, we present a weakly supervised method for surgical tool tracking based on a class of hybrid sensor systems. The targeted class of systems combines electromagnetic (EM) and vision-based modalities. Furthermore, we present a new framework for assessing the quality of nonrigid multimodality image registration in real-time. With the augmented dataset, we construct a solution using various registration quality metrics that are integrated to form a single binary assessment of image registration effectiveness as either high quality or low quality. To address challenges in practical deployment, we present a deep-learning-based hyperspectral image (HSI) classification method that is designed for deployment on resource-constrained devices at the network edge. Due to the large volumes of data produced by HSI sensors, and the complexity of deep neural network (DNN) architectures, developing DNN solutions for HSI classification on resource-constrained platforms is a challenging problem. In this part of the thesis, we introduce a novel approach that integrates DNN-based image analysis with discrete cosine transform (DCT) analysis for HSI classification. In addition to medical image processing and HSI classification, a third application area that we investigate in this thesis is on-board object detection from Unmanned Aerial Vehicles (UAVs), which represents another important domain of interest for the edge-based deployment of CNN methods. In this part of the thesis, we present a novel framework for object detection using images captured from UAVs. The framework is optimized using synthetic datasets that are generated from a game engine to capture imaging scenarios that are specific to the UAV-based operating environment. Using the generated synthetic dataset, we develop new insight on the impact of different UAV-based imaging conditions on object detection performance.
Human-Centric Deep Generative Models: The Blessing and The Curse
(2021) Yu, Ning; Davis, Larry; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models. For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction.
Robust Learning under Distributional Shifts
(2021) Balaji, Yogesh; Chellappa, Rama; Feizi, Soheil; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Designing robust models is critical for reliable deployment of artificial intelligence systems. Deep neural networks perform exceptionally well on test samples that are drawn from the same distribution as the training set. However, they perform poorly when there is a mismatch between training and test conditions, a phenomenon called distributional shift. For instance, the perception system of a self-driving car can produce erratic predictions when it encounters a new test sample with a different illumination or weather condition not seen during training. Such inconsistencies are undesirable, and can potentially create life-threatening conditions as these models are deployed in safety-critical applications. In this dissertation, we develop several techniques for effectively handling distributional shifts in deep learning systems. In the first part of the dissertation, we focus on detecting out-of-distribution shifts that can be used for flagging outlier samples at test-time. We develop a likelihood estimation framework based on deep generative models for this task. In the second part, we study the domain adaptation problem where the objective is to tune the neural network models to adapt to a specific target distribution of interest. We design novel adaptation algorithms, understand and analyze them under various settings. In the last part of the dissertation, we develop robust learning algorithms that can generalize to novel distributional shifts. In particular, we focus on two types of shifts - covariate and adversarial shifts. All developed algorithms are rigorously evaluated on several benchmark datasets.
FAIR URBAN CRIME PREDICTION WITH HUMAN MOBILITY BIG DATA
(2021) Wu, Jiahui; Frias-Martinez, Vanessa; Library & Information Services; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Crime imposes significant costs on society. Reported crime data is important in quantifying the severity of crimes, based on which decision-makers would allocate resources for crime interventions. Human mobility big data has triggered the interest in various fields to study the relationship between urban crimes and mobility at a large scale, especially the predictive power of mobility for urban crimes. This research direction can enrich our understanding of crimes and better inform crime-related decision-making. One concern about reported crime data is the bias issue. The bias could be produced by different levels of residents’ willingness to report potential crime incidents and police activity in neighborhoods. While lots of studies about crime prediction are aware of biases in reported crimes, few of them propose solutions to address or mitigate this issue or to evaluate how this issue would affect prediction models in terms of accuracy or fairness. My dissertation research aims to explore the potential of human mobility big data for crime prediction. Specifically, my dissertation will advance the state-of-the-art by addressing three challenges in mobility-based crime prediction: 1) Constructing mobility features might be sensitive to different methodological choices. Without careful examination of these choices, there might be conflicting findings. One critical area of mobility analysis to predict crime is the identification of urban hotspots. Therefore, my work performs a systematic spatial sensitivity analysis on the impact of these choices and provides guidelines to identify the most stable ones. 2) Under-reporting generates biases in reported crime data. To address such bias, I develop a Bayesian model for long-term crime prediction that infers the unobserved true number of crime incidents. Comprehensive experiments show how the accuracy and fairness of long-term crime prediction would be affected by modeling the under-reporting of crimes. 3) Although empirical studies show promising results about the relationship between human mobility and long-term crime prediction, the effects of mobility features on short-term crime prediction have yet to be explored. Therefore, my work conducts a series of experiments to explore how incorporating mobility features into short-term crime prediction models affects their performance in terms of accuracy and fairness.
Closing the Gap Between Classification and Retrieval Models
(2021) Taha, Ahmed; Davis, Larry; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Retrieval networks learn a feature embedding where similar samples are close together, and different samples are far apart. This feature embedding is essential for computer vision applications such as face/person recognition, zero-shot learn- ing, and image retrieval. Despite these important applications, retrieval networks are less popular compared to classification networks due to multiple reasons: (1) The cross-entropy loss – used with classification networks – is stabler and converges faster compared to metric learning losses – used with retrieval networks. (2) The cross-entropy loss has a huge toolbox of utilities and extensions. For instance, both AdaCos and self-knowledge distillation have been proposed to tackle low sample complexity in classification networks; also, both CAM and Grad-CAM have been proposed to visualize attention in classification networks. To promote retrieval networks, it is important to equip them with an equally powerful toolbox. Accordingly, we propose an evolution-inspired approach to tackle low sample complexity in feature embedding. Then, we propose SVMax to regularize the feature embedding and avoid model collapse. Furthermore, we propose L2-CAF to visualize attention in retrieval networks. To tackle low sample complexity, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. This approach not only boosts performance but also learns a slim (pruned) network with a smaller inference cost. KE reduces both overfitting and the burden for data collection. To regularize the feature embedding and avoid model collapse, We propose singular value maximization (SVMax) to promote a uniform feature embedding. Our formulation mitigates model collapse and enables larger learning rates. SV- Max is oblivious to both the input-class (labels) and the sampling strategy. Thus it promotes a uniform feature embedding in both supervised and unsupervised learning. Furthermore, we present a mathematical analysis of the mean singular value’s lower and upper bounds. This analysis makes tuning the SVMax’s balancing- hyperparameter easier when the feature embedding is normalized to the unit circle. To support retrieval networks with a visualization tool, we formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. This approach imposes no constraints on the network architecture besides having a convolution layer. The input can be a regular image or a pre-extracted convolutional feature. The network output can be logits trained with cross-entropy or a space embedding trained with a ranking loss. Furthermore, this approach neither changes the original network weights nor requires fine-tuning. Thus, network performance remains intact. The visualization filter is applied only when an attention map is required. Thus, it poses no computational overhead during inference. L2-CAF visualizes the attention of the last convolutional layer ofGoogLeNet within 0.3 seconds. Finally, we propose a compromise between retrieval and classification networks. We propose a simple, yet effective, two-head architecture — a network with both logits and feature-embedding heads. The embedding head — trained with a ranking loss — limits the overfitting capabilities of the cross-entropy loss by promoting a smooth embedding space. In our work, we leverage the semi-hard triplet loss to allow a dynamic number of modes per class, which is vital when working with imbalanced data. Also, we refute a common assumption that training with a ranking loss is computationally expensive. By moving both the triplet loss sampling and computation to the GPU, the training time increases by just 2%.
GLOBAL BARE GROUND GAIN BETWEEN 2000 AND 2012 AND THE RELATIONSHIP WITH SOCIOECONOMIC DEVELOPMENT
(2020) Ying, Qing; Hansen, Matthew C; Geography; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Bare ground gain -- the complete removal of vegetation due to land use changes, represents an extreme land cover transition that completely alters the structure and functioning of ecosystems. The fast expansion of bare ground cover is directly associated with increasing population and urbanization, resulting in accelerated greenhouse gas emissions, intensified urban heat island phenomenon, and extensive habitat fragments and loss. While the economic return of settlement and infrastructure construction has improved human livelihoods, the negative impacts on the environment have disproportionally affected vulnerable population, creating inequality and tension in society. The area, distribution, drivers, and change rates of global bare ground gain were not systematically quantified; neither was the relationship between such dynamics and socioeconomic development. This dissertation seeks methods for operational characterization of bare ground expansion, advances our understanding of the magnitudes, dynamics, and drivers of global bare ground gain between 2000 and 2012, and uncovers the implications of such change for macro-economic development monitoring, all through Landsat satellite observations. The approach that employs wall-to-wall maps of bare ground gain classified from Landsat imagery for probability sample selection is proved particularly effective for unbiased area estimation of global, continental, and national bare ground gain, as a small land cover and land use change theme. Anthropogenic land uses accounted for 95% of the global bare ground gain, largely consisting of commercial/residential built-up, infrastructure development, and resource extraction. China and the United States topped the total area increase in bare ground. Annual change rates of anthropogenic bare ground gain are found as a leading indicator of macro-economic change in the study period dominated by the 2007-2008 global financial crisis, through econometric analysis between annual gains in the bare ground of different land use outcomes and economic fluctuations in business cycles measured by detrended economic variables. Instead of intensive manual interpretation of land-use attributes of probability sample, an approach of integrating a pixel- and an object- based deep learning algorithms is proposed and tested feasible for automatic attribution of airports, a transportation land use with economic importance.
AN ANALYSIS OF BOTTOM-UP ATTENTION MODELS AND MULTIMODAL REPRESENTATION LEARNING FOR VISUAL QUESTION ANSWERING
(2019) Narayanan, Venkatraman; Shrivastava, Abhinav; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
A Visual Question Answering (VQA) task is the ability of a system to take an image and an open-ended, natural language question about the image and provide a natural language text answer as the output. The VQA task is a relatively nascent field, with only a few strategies explored. The performance of the VQA system, in terms of accuracy of answers to the image-question pairs, requires a considerable overhaul before the system can be used in practice. The general system for performing the VQA task consists of an image encoder network, a question encoder network, a multi-modal attention network that combines the information obtained image and question, and answering network that generates natural language answers for the image-question pair. In this thesis, we follow two strategies to improve the performance (accuracy) of VQA. The first is a representation learning approach (utilizing the state-of-the-art Generative Adversarial Models (GANs) (Goodfellow, et al., 2014)) to improve the image encoding system of VQA. This thesis evaluates four variants of GANs to identify a GAN architecture that best captures the data distribution of the images, and it was determined that GAN variants become unstable and fail to become a viable image encoding system in VQA. The second strategy is to evaluate an alternative approach to the attention network, using multi-modal compact bilinear pooling, in the existing VQA system. The second strategy led to an increase in the accuracy of VQA by 2% compared to the current state-of-the-art technique.

Theses and Dissertations from UMD

Browse

Filters

Settings

Sort By

Results per page

Search Results