Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
13 results
Search Results
Item Collective dynamics of astrocyte and cytoskeletal systems(2024) Mennona, Nicholas John; Losert, Wolfgang; Physics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Advances in imaging and biological sample preparations now allow researchersto study collective behavior in cellular networks with unprecedented detail. Imaging the electrical signaling of neuronal networks at the cellular level has generated exciting insights into the multiscale interactions within the brain. This thesis aims at a complementary view of the general information processing of the brain, focusing on other modes of non-electrical information. The modes discussed are the collective, dynamical characteristics of non-electrically active, non-neuronal brain cells, and mechanical systems. Astrocytes are the studied non-neuronal brain cells, and the cytoskeleton is the studied dynamic, mechanical system consisting of various filamentous networks. The two filamentous networks studied herein are the actin cytoskeleton and the microtubule network. Techniques from calcium imaging and cell mechanics are adapted to measure these often overlooked information channels, which operate at length scales and timescales distinct from electrical information transmission. Structural, astrocyte actin images, microtubule structural image sequences, and the calcium signals of collections of astrocytes are analyzed using computer vision and information theory. Filamentous alignment of actin with nearby boundaries reveals that stellate astrocytes have more perpendicularly oriented actin than undifferentiated astrocytes. Harnessing the larger length scale and slower dynamical time scale of microtubule filaments relative to actin filaments led to the creation of a computer vision tool to measure lateral filamentous fluctuations. Finally, we adapt information theory to the analog calcium (Ca2+) signals within astrocyte networks classified according to subtype. We find that, despite multiple physiological differences between immature and injured astrocytes, stellate (healthy) astrocytes have the same speed of information transport as these other astrocyte subtypes. This uniformity in speed persists when either the cytoskeleton (Latrunculin B) or energy state (ATP) is perturbed. Astrocytes, regardless of physiological subtype, tend to behave similarly when active under normal conditions. However, these healthy astrocytes respond most significantly to energy perturbation, relative to immature and injured astrocytes, as viewed through cross-correlation, mutual information, and partitioned entropy. These results indicate the value of drawing information from structure and dynamics. We developed and adapted tools across scales from nanometer scale alignment of actin filaments to hundreds of microns scale information dynamics in astrocyte networks. Including all potential modalities of information within complex biological systems, such as the collective dynamics of astrocytes and the cytoskeleton in brain networks is a step toward a fuller characterization of brain functioning and cognition.Item Enhanced Robot Planning and Perception Through Environment Prediction(2024) Sharma, Vishnu Dutt; Tokekar, Pratap; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Mobile robots rely on maps to navigate through an environment. In the absence of any map, the robots must build the map online from partial observations as they move in the environment. Traditional methods build a map using only direct observations. In contrast, humans identify patterns in the observed environment and make informed guesses about what to expect ahead. Modeling these patterns explicitly is difficult due to the complexity in the environments. However, these complex models can be approximated well using learning-based methods in conjunction with large training data. By extracting patterns, robots can use not only direct observations but also predictions of what lies ahead to better navigate through an unknown environment. In this dissertation, we present several learning-based methods to equip mobile robots with prediction capabilities for efficient and safer operation. In the first part of the dissertation, we learn to predict using geometrical and structural patterns in the environment. Partially observed maps provide invaluable cues for accurately predicting the unobserved areas. We first demonstrate the capability of general learning-based approaches to model these patterns for a variety of overhead map modalities. Then we employ task-specific learning for faster navigation in indoor environments by predicting 2D occupancy in the nearby regions. This idea is further extended to 3D point cloud representation for object reconstruction. Predicting the shape of the full object from only partial views, our approach paves the way for efficient next-best-view planning, which is a crucial requirement for energy-constrained aerial robots. Deploying a team of robots can also accelerate mapping. Our algorithms benefit from this setup as more observation results in more accurate predictions and further improves the task efficiency in the aforementioned tasks. In the second part of the dissertation, we learn to predict using spatiotemporal patterns in the environment. We focus on dynamic tasks such as target tracking and coverage where we seek decentralized coordination between robots. We first show how graph neural networks can be used for more scalable and faster inference while achieving comparable coverage performance as classical approaches. We find that differentiable design is instrumental here for end-to-end task-oriented learning. Building on this, we present a differentiable decision-making framework that consists of a differentiable decentralized planner and a differentiable perception module for dynamic tracking. In the third part of the dissertation, we show how to harness semantic patterns in the environment. Adding semantic context to the observations can help the robots decipher the relations between objects and infer what may happen next based on the activity around them. We present a pipeline using vision-language models to capture a wider scene using an overhead camera to provide assistance to humans and robots in the scene. We use this setup to implement an assistive robot to help humans with daily tasks, and then present a semantic communication-based collaborative setup of overhead-ground agents, highlighting the embodiment-specific challenges they may encounter and how they can be overcome. The first three parts employ learning-based methods for predicting the environment. However, if the predictions are incorrect, this could pose a risk to the robot and its surroundings. The third part of the dissertation presents risk management methods with meta-reasoning over the predictions. We study two such methods: one extracting uncertainty from the prediction model for risk-aware planning, and another using a heuristic to adaptively switch between classical and prediction-based planning, resulting in safe and efficient robot navigation.Item Human-Centric Deep Generative Models: The Blessing and The Curse(2021) Yu, Ning; Davis, Larry; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models. For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction.Item Unblock: Interactive Perception for Decluttering(2021) govindaraj, krithika; Aloimonos, Yiannis; Systems Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Novel segmentation algorithms can easily identify objects that are occludedor partially occluded, however in highly cluttered scenes the degree of occlusion is so high that some objects may not be visible to a static camera. In these scenarios, humans use action to change the configuration of the environment, elicit more information through perception, process the information before taking the next action. Reinforcement learning models this behavior, however unlike humans, the phase where perception data is understood is not included, as images are directly used as observations. The aim of this thesis is to establish a novel method that indirectly uses perception data for reinforcement learning to address the task of decluttering a scene using a static camera.Item Closing the Gap Between Classification and Retrieval Models(2021) Taha, Ahmed; Davis, Larry; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Retrieval networks learn a feature embedding where similar samples are close together, and different samples are far apart. This feature embedding is essential for computer vision applications such as face/person recognition, zero-shot learn- ing, and image retrieval. Despite these important applications, retrieval networks are less popular compared to classification networks due to multiple reasons: (1) The cross-entropy loss – used with classification networks – is stabler and converges faster compared to metric learning losses – used with retrieval networks. (2) The cross-entropy loss has a huge toolbox of utilities and extensions. For instance, both AdaCos and self-knowledge distillation have been proposed to tackle low sample complexity in classification networks; also, both CAM and Grad-CAM have been proposed to visualize attention in classification networks. To promote retrieval networks, it is important to equip them with an equally powerful toolbox. Accordingly, we propose an evolution-inspired approach to tackle low sample complexity in feature embedding. Then, we propose SVMax to regularize the feature embedding and avoid model collapse. Furthermore, we propose L2-CAF to visualize attention in retrieval networks. To tackle low sample complexity, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. This approach not only boosts performance but also learns a slim (pruned) network with a smaller inference cost. KE reduces both overfitting and the burden for data collection. To regularize the feature embedding and avoid model collapse, We propose singular value maximization (SVMax) to promote a uniform feature embedding. Our formulation mitigates model collapse and enables larger learning rates. SV- Max is oblivious to both the input-class (labels) and the sampling strategy. Thus it promotes a uniform feature embedding in both supervised and unsupervised learning. Furthermore, we present a mathematical analysis of the mean singular value’s lower and upper bounds. This analysis makes tuning the SVMax’s balancing- hyperparameter easier when the feature embedding is normalized to the unit circle. To support retrieval networks with a visualization tool, we formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. This approach imposes no constraints on the network architecture besides having a convolution layer. The input can be a regular image or a pre-extracted convolutional feature. The network output can be logits trained with cross-entropy or a space embedding trained with a ranking loss. Furthermore, this approach neither changes the original network weights nor requires fine-tuning. Thus, network performance remains intact. The visualization filter is applied only when an attention map is required. Thus, it poses no computational overhead during inference. L2-CAF visualizes the attention of the last convolutional layer ofGoogLeNet within 0.3 seconds. Finally, we propose a compromise between retrieval and classification networks. We propose a simple, yet effective, two-head architecture — a network with both logits and feature-embedding heads. The embedding head — trained with a ranking loss — limits the overfitting capabilities of the cross-entropy loss by promoting a smooth embedding space. In our work, we leverage the semi-hard triplet loss to allow a dynamic number of modes per class, which is vital when working with imbalanced data. Also, we refute a common assumption that training with a ranking loss is computationally expensive. By moving both the triplet loss sampling and computation to the GPU, the training time increases by just 2%.Item DEEP LEARNING FOR FORENSICS(2020) Zhou, Peng; Davis, Larry; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The advent of media sharing platforms and the easy availability of advanced photo or video editing software have resulted in a large quantity of manipulated images and videos being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of fake news and misinformation is growing. Therefore, detecting manipulation has become an emerging necessity. Different from traditional classification, semantic object detection or segmentation, manipulation detection/classification pays more attention to low-level tampering artifacts than to semantic content. The main challenges in this problem include (a) investigating features to reveal tampering artifacts, (b) developing generic models which are robust to a large scale of post-processing methods, (c) applying algorithms to higher resolution in real scenarios and (d) handling the new emerging manipulation techniques. In this dissertation, we propose approaches to tackling these challenges. Manipulation detection utilizes both low-level tamper artifacts and semantic contents, suggesting that richer features needed to be harnessed to reveal more evidence. To learn rich features, we propose a two-stream Faster R-CNN network and train it end-to-end to detect the tampered regions given a manipulated image. Experiments on four standard image manipulation datasets demonstrate that our two-stream framework outperforms each individual stream, and also achieves state-of-the-art performance compared to alternative methods with robustness to resizing and compression. Additionally, to extend manipulation detection from image to video, we introduce VIDNet, Video Inpainting Detection Network, which contains an encoder-decoder architecture with a quad-directional local attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis (ELA) frames to augment RGB frames, producing multimodal features at different levels with an encoder. Besides, to improve the generalization of manipulation detection model, we introduce a manipulated image generation process that creates true positives using currently available datasets. Drawing from traditional work on image blending, we propose a novel generator for creating such examples. In addition, we also propose to further create examples that force the algorithm to focus on boundary artifacts during training. Extensive experimental results validate our proposal. Furthermore, to apply deep learning models to high resolution scenarios efficiently, we treat the problem as a mask refinement given a coarse low resolution prediction. We propose to convert the regions of interest into strip images and compute a boundary prediction in the strip domain. Extensive experiments on both the public and a newly created high resolution dataset strongly validate our approach. Finally, to handle new emerging manipulation techniques while preserving performance on learned manipulation, we investigate incremental learning. We propose a multi-model and multi-level knowledge distillation strategy to preserve performance on old categories while training on new categories. Experiments on standard incremental learning benchmarks show that our method improves the overall performance over standard distillation techniques.Item Effects of Slope Ratio, Straw Mulching, and Compost Amendment on Vegetation Establishment and Runoff Generation(2020) Owen, Dylan; Davis, Allen P; Aydilek, Ahmet; Civil Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Soil erosion management is a major environmental challenge facing highway construction. This study was undertaken to evaluate the effectiveness of compost use in lieu of topsoil for final grade turfgrass establishment on highway slopes. Two compost types, biosolids and greenwaste, and four compost/topsoil blends were compared with a topsoil standard (TS; with straw and fertilizer application) in their ability to reduce soil and nutrient loss and improve vegetation establishment. A series of greenhouse studies and field tests were conducted to analyze the effects of slope ratio, straw mulching, and compost mixing ratio on runoff by observing green vegetation (GV) establishment, runoff volume generation, and nutrient and sediment export. GV was measured using an innovative image segmentation and classification algorithm coupled with machine learning approaches with varying block size and classification acceptance thresholds. Algorithm classifications were compared to manual coverage classifications with R-squared values of 0.86 for GV, 0.87 for straw/dormant vegetation, and 0.96 for exposed soil, respectively. Straw mulching (≥95% straw cover) reduced evaporation rates and soil sealing and increased soil roughness and field capacity (FC), which significantly reduced volume runoff (34-99%) and mass export of sediment and nutrients (81-91%). With mulching, no statistical differences were found in GV establishment among the compost and TS treatments (≥95% cover in 60 days) while non-mulched media cover reached a maximum of 35%, due to limited moisture availability. Composted material (excluding 2:1 compost: topsoil mixtures) had higher hydraulic conductivity, FC, and shear strength than TS which, combined with straw mulching, reduced total runoff volume by 33-72%. This led to sediment and nutrient mass reductions of 57-97% and 6-82%, respectively, from standard TS. A general increase in runoff generation and decrease in GV was seen with slope ratio increase (41-96% more nutrient and sediment export and 81-97% lower GV from 20:1 to 2:1 slopes). However, benefits displayed at 25% slope were reduced at shallower slopes and enhanced at greater slopes. The use of compost as an additive or replacement to TS, with straw mulching, was seen to reduce runoff generation and improve runoff quality from the TS standard and is suggested as possible alternatives.Item Alternating Optimization: Constrained Problems, Adversarial Networks, and Robust Models(2019) Xu, Zheng; Goldstein, Tom; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Data-driven machine learning methods have achieved impressive performance for many industrial applications and academic tasks. Machine learning methods usually have two stages: training a model from large-scale samples, and inference on new samples after the model is deployed. The training of modern models relies on solving difficult optimization problems that involve nonconvex, nondifferentiable objective functions and constraints, which is sometimes slow and often requires expertise to tune hyperparameters. While inference is much faster than training, it is often not fast enough for real-time applications.We focus on machine learning problems that can be formulated as a minimax problem in training, and study alternating optimization methods served as fast, scalable, stable and automated solvers. First, we focus on the alternating direction method of multipliers (ADMM) for constrained problem in classical convex and nonconvex optimization. Some popular machine learning applications including sparse and low-rank models, regularized linear models, total variation image processing, semidefinite programming, and consensus distributed computing. We propose adaptive ADMM (AADMM), which is a fully automated solver achieving fast practical convergence by adapting the only free parameter in ADMM. We further automate several variants of ADMM (relaxed ADMM, multi-block ADMM and consensus ADMM), and prove convergence rate guarantees that are widely applicable to variants of ADMM with changing parameters. We release the fast implementation for more than ten applications and validate the efficiency with several benchmark datasets for each application. Second, we focus on the minimax problem of generative adversarial networks (GAN). We apply prediction steps to stabilize stochastic alternating methods for the training of GANs, and demonstrate advantages of GAN-based losses for image processing tasks. We also propose GAN-based knowledge distillation methods to train small neural networks for inference acceleration, and empirically study the trade-off between acceleration and accuracy.Third, we present preliminary results on adversarial training for robust models. We study fast algorithms for the attack and defense for universal perturbations, and then explore network architectures to boost robustness.Item COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS(2017) Du, Xianzhi; Davis, Larry; Doermann, David; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)There are three work on signature matching for document analysis. In the first work, we propose a large-scale signature matching method based on locality sensitive hashing (LSH). Shape Context features are used to describe the structure of signatures. Two stages of hashing are performed to find the nearest neighbors for query signatures. We show that our algorithm can achieve a high accuracy even when few signatures are collected from one same person and perform fast matching when dealing with a large dataset. In the second work, we present a novel signature matching method based on supervised topic models. Shape Context features are extracted from signature shape contours which capture the local variations in signature properties. We then use the concept of topic models to learn the shape context features which correspond to individual authors. We demonstrate considerable improvement over state of the art methods. In the third work, we present a partial signature matching method using graphical models. In additional to the second work, modified shape context features are extracted from the contour of signatures to describe both full and partial signatures. Hierarchical Dirichlet processes are implemented to infer the number of salient regions needed. The results show the effectiveness of the approach for both the partial and full signature matching. There are three work on deep learning for object detection and segmentation. In the first work, we propose a deep neural network fusion architecture for fast and robust pedestrian detection. The proposed network fusion architecture allows for parallel processing of multiple networks for speed. A single shot deep convolutional network is trained as an object detector to generate all possible pedestrian candidates of different sizes and occlusions. Next, multiple deep neural networks are used in parallel for further refinement of these pedestrian candidates. We introduce a soft-rejection based network fusion method to fuse the soft metrics from all networks together to generate the final confidence scores. Our method performs better than existing state-of-the-arts, especially when detecting small-size and occluded pedestrians. Furthermore, we propose a method for integrating pixel-wise semantic segmentation network into the network fusion architecture as a reinforcement to the pedestrian detector. In the second work, in addition to the first work, a fusion network is trained to fuse the multiple classification networks. Furthermore, a novel soft-label method is devised to assign floating point labels to the pedestrian candidates. This metric for each candidate detection is derived from the percentage of overlap of its bounding box with those of other ground truth classes. In the third work, we propose a boundary-sensitive deep neural network architecture for portrait segmentation. A residual network and atrous convolution based framework is trained as the base portrait segmentation network. To better solve boundary segmentation, three techniques are introduced. First, an individual boundary-sensitive kernel is introduced by labeling the boundary pixels as a separate class and using the soft-label strategy to assign floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple classes when updating loss based on its relative position to the contour. Second, a global boundary-sensitive kernel is used when updating loss function to assign different weights to pixel locations on one image to constrain the global shape of the resulted segmentation map. Third, we add multiple binary classifiers to classify boundary-sensitive portrait attributes, so as to refine the learning process of our model.Item GEOMETRIC REPRESENTATIONS AND DEEP GAUSSIAN CONDITIONAL RANDOM FIELD NETWORKS FOR COMPUTER VISION(2016) Vemulapalli, Raviteja; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Representation and context modeling are two important factors that are critical in the design of computer vision algorithms. For example, in applications such as skeleton-based human action recognition, representations that capture the 3D skeletal geometry are crucial for achieving good action recognition accuracy. However, most of the existing approaches focus mainly on the temporal modeling and classification steps of the action recognition pipeline instead of representations. Similarly, in applications such as image enhancement and semantic image segmentation, modeling the spatial context is important for achieving good performance. However, the standard deep network architectures used for these applications do not explicitly model the spatial context. In this dissertation, we focus on the representation and context modeling issues for some computer vision problems and make novel contributions by proposing new 3D geometry-based representations for recognizing human actions from skeletal sequences, and introducing Gaussian conditional random field model-based deep network architectures that explicitly model the spatial context by considering the interactions among the output variables. In addition, we also propose a kernel learning-based framework for the classification of manifold features such as linear subspaces and covariance matrices which are widely used for image set-based recognition tasks. This dissertation has been divided into five parts. In the first part, we introduce various 3D geometry-based representations for the problem of skeleton-based human action recognition. The proposed representations, referred to as R3DG features, capture the relative 3D geometry between various body parts using 3D rigid body transformations. We model human actions as curves in these R3DG feature spaces, and perform action recognition using a combination of dynamic time warping, Fourier temporal pyramid representation and support vector machines. Experiments on several action recognition datasets show that the proposed representations perform better than many existing skeletal representations. In the second part, we represent 3D skeletons using only the relative 3D rotations between various body parts instead of full 3D rigid body transformations. This skeletal representation is scale-invariant and belongs to a Lie group based on the special orthogonal group. We model human actions as curves in this Lie group and map these curves to the corresponding Lie algebra by combining the logarithm map with rolling maps. Using rolling maps reduces the distortions introduced in the action curves while mapping to the Lie algebra. Finally, we perform action recognition by classifying the Lie algebra curves using Fourier temporal pyramid representation and a support vector machines classifier. Experimental results show that by combining the logarithm map with rolling maps, we can get improved performance when compared to using the logarithm map alone. In the third part, we focus on classification of manifold features such as linear subspaces and covariance matrices. We present a kernel-based extrinsic framework for the classification of manifold features and address the issue of kernel selection using multiple kernel learning. We introduce two criteria for jointly learning the kernel and the classifier by solving a single optimization problem. In the case of support vector machine classifier, we formulate the problem of learning a good kernel-classifier combination as a convex optimization problem. The proposed approach performs better than many existing methods for the classification of manifold features when applied to image set-based classification task. In the fourth part, we propose a novel end-to-end trainable deep network architecture for image denoising based on a Gaussian Conditional Random Field (CRF) model. Contrary to existing discriminative denoising approaches, the proposed network explicitly models the input noise variance and hence is capable of handling a range of noise levels. This network consists of two sub-networks: (i) a parameter generation network that generates the Gaussian CRF pairwise potential parameters based on the input image, and (ii) an inference network whose layers perform the computations involved in an iterative Gaussian CRF inference procedure. Experiments on several images show that the proposed approach produces results on par with the state-of-the-art without training a separate network for each noise level. In the final part of this dissertation, we propose a Gaussian CRF model-based deep network architecture for the task of semantic image segmentation. This network explicitly models the interactions between output variables which is important for structured prediction tasks such as semantic segmentation. The proposed network is composed of three sub-networks: (i) a Convolutional Neural Network (CNN) based unary network for generating the unary potentials, (ii) a CNN-based pairwise network for generating the pairwise potentials, and (iii) a Gaussian mean field inference network for performing Gaussian CRF inference. When trained end-to-end in a discriminative fashion the proposed network outperforms various CNN-based semantic segmentation approaches.