Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2765

Browse

Search Results

Now showing 1 - 10 of 17
  • Item
    Representation Learning for Reinforcement Learning: Modeling Non-Gaussian Transition Probabilities with a Wasserstein Critic
    (2024) Tse, Ryan; Zhang, Kaiqing; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Reinforcement learning algorithms depend on effective state representations when solving complex, high-dimensional environments. Recent methods learn state representations using auxiliary objectives that aim to capture relationships between states that are behaviorally similar, meaning states that lead to similar future outcomes under optimal policies. These methods learn explicit probabilistic state transition models and compute distributional distances between state transition probabilities as part of their measure of behavioral similarity. This thesis presents a novel extension to several of these methods that directly learns the 1-Wasserstein distance between state transition distributions by exploiting the Kantorovich-Rubenstein duality. This method eliminates parametric assumptions about the state transition probabilities while providing a smoother estimator of distributional distances. Empirical evaluation demonstrates improved sample efficiency over some of the original methods and a modest increase in computational cost per sample. The results establish that relaxing theoretical assumptions about state transition modeling leads to more flexible and robust representation learning while maintaining strong performance characteristics.x
  • Item
    Efficient learning-based sound propagation for virtual and real-world audio processing applications
    (2024) Ratnarajah, Anton Jeran; Manocha, Dinesh; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Sound propagation is the process by which sound energy travels through a medium, such as air, to the surrounding environment as sound waves. The room impulse response (RIR) describes this process and is influenced by the positions of the source and listener, the room's geometry, and its materials. Physics-based acoustic simulators have been used for decades to compute accurate RIRs for specific acoustic environments. However, we have encountered limitations with existing acoustic simulators. For example, they require a 3D representation and detailed material knowledge of the environment. To address these limitations, we propose three novel solutions. First, we introduce a learning-based RIR generator that is two orders of magnitude faster than an interactive ray-tracing simulator. Our approach can be trained to input both statistical and traditional parameters directly, and it can generate both monaural and binaural RIRs for both reconstructed and synthetic 3D scenes. Our generated RIRs outperform interactive ray-tracing simulators in speech-processing applications, including Automatic Speech Recognition (ASR), Speech Enhancement, and Speech Separation, by 2.5%, 12%, and 48%, respectively. Secondly, we propose estimating RIRs from reverberant speech signals and visual cues in the absence of a 3D representation of the environment. By estimating RIRs from reverberant speech, we can augment training data to match test data, improving the word error rate of the ASR system. Our estimated RIRs achieve a 6.9% improvement over previous learning-based RIR estimators in real-world far-field ASR tasks. We demonstrate that our audio-visual RIR estimator aids tasks like visual acoustic matching, novel-view acoustic synthesis, and voice dubbing, validated through perceptual evaluation. Finally, we introduce IR-GAN to augment accurate RIRs using real RIRs. IR-GAN parametrically controls acoustic parameters learned from real RIRs to generate new RIRs that imitate different acoustic environments, outperforming Ray-tracing simulators on the Kaldi far-field ASR benchmark by 8.95%.
  • Item
    Learning Autonomous Underwater Navigation with Bearing-Only Data
    (2024) Robertson, James; Duraiswami, Ramani; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recent applications of deep reinforcement learning in controlling maritime autonomoussurface vessels have shown promise for integration into maritime transportation. These could have the potential to reduce at-sea incidents such as collisions and groundings which are majorly attributed to human error. With this in mind the goal of this work is to evaluate how well a similar deep reinforcement learning agent could perform the same task in submarines but using passive SONAR rather than the ranging data provided by active RADAR aboard surface vessels. A simulated submarine outfitted with a passive spherical, hull-mounted SONAR sensor is placed into contact scenarios under the control of a reinforcement learning agent and directed to make its way to a navigational waypoint while avoiding interfering surface vessels. In order to see how this best translates to lower power autonomous vessels (vice warship submarines), no estimation for the range of the surface vessels is maintained in order to cut down on computing requirements. Inspired by my time aboard U.S. Navy submarines, the agent is provided with simply the simulated passive SONAR data. I show that this agent is capable of navigating to a waypoint while avoiding crossing, overtaking, and head-on surface vessels and thus could provide a recommended course to a submarine contact management team in ample time since the maneuvers made by the agent are not instantaneous in contrast to the assumptions of traditional target tracking with bearing-only data. Additionally, an in-progress plugin for Epic Games’ Unreal Engine is presented with the ability to simulate underwater acoustics inside the 3D development software. Unreal Engine is a powerful 3D game engine that is incredibly flexible and capable of being integrated into many different forms of scientific research. This plugin could provide researchers with the ability to conduct useful simulations in intuitively designed 3D environments.
  • Item
    Cardiovascular Physiological Monitoring Based on Video
    (2023) Gebeyehu, Henok; Wu, Min; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Regular, continuous monitoring of the heart is advantageous to maintaining one’s cardiovascular health as it enables the early detection of potentially life-threatening cardiovascular diseases. Typically, the required devices for continuous monitoring are found in a clinical setting, but recent research developments have advanced remote physiological monitoring capabilities and expanded the options for continuous monitoring from home. This thesis focuses on further extending the monitoring capabilities of consumer electronic devices to motivate the feasibility of reconstructing Electrocardiograms via a smartphone camera. First, the relationship between skin tone and remote physiological sensing is examined as variations in melanin concentrations for people of diverse skin tones can affect remote physiological sensing. In this work, a study is performed to observe the prospect of reducing the performance disparity caused by melanin differences by exploring the sites from which the physiological signal is collected. Second, the physiological signals obtained from the previous part are enhanced to improve the signal-to-noise ratio and utilized to infer ECG as parts of a novel technique that emphasizes interpretability as a guiding principle. The findings in this work have the potential to enable and promote the remote sensing of a physiological signal that is more informative than what is currently possible with remote sensing.
  • Item
    Robust Reinforcement Learning via Risk-Sensitivity
    (2023) Noorani, Erfaun; Baras, John S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The objective of this research is to develop robust-resilient-adaptive Reinforcement Learning (RL) systems that are generic, provide performance guarantees, and can generalize-reason-improve in complex and unknown task environments. To achieve this objective, we focus on exploring the concept of Risk-sensitivity in RL systems and its extensions to Multi-Agent (MA) systems. The development of robust reinforcement learning algorithms is crucial to address challenges such as model misspecification, parameter uncertainty, disturbances, and more. Risk-sensitive methods offer an approach to developing robust RL algorithms by hedging against undesirable outcomes in a probabilistic manner. The robustness properties of risk-sensitive controllers have long been established. We investigate risk-sensitive RL (as a generalization of risk-sensitive stochastic control), by theoretically analyzing the risk-sensitive exponential (exponential of the total reward) criteria and the benefits and improvements the introduction of risk-sensitivity brings to conventional RL. By considering exponential criteria as risk measures, we aim to enhance the reliability of our decision-making process. We explore the exponential criteria to better understand its representation, the implications of its optimization, and the behavioral characteristics exhibited by an agent optimizing this criterion. We demonstrate the advantages of utilizing exponential criteria for the development of RL algorithms. We then shift our focus to developing algorithms that effectively leverage these exponential criteria. To do that, we first focus on developing risk-sensitive RL algorithms within the framework of Markov Decision Processes (MDPs). We then broaden our scope by exploring the application of the Probabilistic Graphical Models (PGM) framework for developing risk-sensitive algorithms. Within this context, we delve into the PGM framework and examine its connection with the MDP framework. We proceed by exploring the effects of risk sensitivity on trust, collaboration, and cooperation in multi-agent systems. To conclude, we finally investigate the concept of risk sensitivity and the robust properties of risk-sensitive algorithms in decision-making and optimization domains beyond RL. Specifically, we focus on safe RL using risk-sensitive filters. Through our exploration, we seek to enhance the understanding and applicability of risk-sensitive approaches in various domains.
  • Item
    Efficient Machine Learning Techniques for Neural Decoding Systems
    (2022) wu, xiaomin; Bhattacharyya, Shuvra S.; Chen, Rong; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we explore efficient machine learning techniques for calcium imaging based neural decoding in two directions: first, techniques for pruning neural network models to reduce computational complexity and memory cost while retaining high accuracy; second, new techniques for converting graph-based input into low-dimensional vector form, which can be processed more efficiently by conventional neural network models. Neural decoding is an important step in connecting brain activity to behavior --- e.g., to predict movement based on acquired neural signals. Important application areas for neural decoding include brain-machine interfaces and neuromodulation. For application areas such as these, real-time processing of neural signals is important as well as high quality information extraction from the signals. Calcium imaging is a modality that is of increasing interest for studying brain activity. Miniature calcium imaging is a neuroimaging modality that can observe cells in behaving animals with high spatial and temporalresolution, and with the capability to provide chronic imaging. Compared to alternative modalities, calcium imaging has potential to enable improved neural decoding accuracy. However, processing calcium images in real-time is a challenging task as it involves multiple time-consuming stages: neuron detection, motion correction, and signal extraction. Traditional neural decoding methods, such as those based on Wiener and Kalman filters, are fast; however, they are outperformed in terms of accuracy by recently-developed deep neural network (DNN) models. While DNNs provide improved accuracy, they involve high computational complexity, which exacerbates the challenge of real-time processing. Addressing the challenges of high-accuracy, real-time, DNN-based neural decoding is the central objective of this research. As a first step in addressing these challenges, we have developed the NeuroGRS system. NeuroGRS is designed to explore design spaces for compact DNN models and optimize the computational complexity of the models subject to accuracy constraints. GRS, which stands for Greedy inter-layer order with Random Selection of intra-layer units, is an algorithm that we have developed for deriving compact DNN structures. We have demonstrated the effectiveness of GRS to transform DNN models into more compact forms that significantly reduce processing and storage complexity while retaining high accuracy. While NeuroGRS provides useful new capabilities for deriving compact DNN models subject to accuracy constraints, the approach has a significant limitation in the context of neural decoding. This limitation is its lack of scalability to large DNNs. Large DNNs arise naturally in neural decoding applications when the brain model under investigation involves a large number of neurons. As the size of the input DNN increases, NeuroGRS becomes prohibitively expensive in terms of computationaltime. To address this limitation, we have performed a detailed experimental analysis of how pruned solutions evolve as GRS operates, and we have used insights from this analysis to develop a new DNN pruning algorithm called Jump GRS (JGRS). JGRS maintains similar levels of model quality --- in terms of predictive accuracy --- as GRS while operating much more efficiently and being able to handle much larger DNNs under reasonable amounts of time and reasonable computational resources. Jump GRS incorporates a mechanism that bypasses (``jumps over'') validation and retraining during carefully-selected iterations of the pruning process. We demonstrate the advantages and improved scalability of JGRS compared to GRS through extensive experiments in the context of DNNs for neural decoding. We have also developed methods for raising the level of abstraction in the signal representation used for calcium imaging analysis. As a central part of this work, we invented the WGEVIA (Weighted Graph Embedding with Vertex Identity Awareness) algorithm, which enables DNN-based processing of neuron activity that is represented in the form of microcircuits. In contrast to traditional representations of neural signals, which involve spiking signals, a microcircuit representation is a graphical representation. Each vertex in a microcircuit corresponds to a neuron, and each edge carries a weight that captures information about firing relationships between the neurons associated with the vertices that are incident to the edge. Our experiments demonstrate that WGEVIA is effective at extracting information from microcircuits. Moreover,raising the level of abstraction to microcircuit analysis has the potential to enable more powerful signal extraction under limited processing time and resources.
  • Item
    Towards in-the-wild visual understanding
    (2022) Rambhatla, Sai Saketh; Chellappa, Rama; Shrivastava, Abhinav; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Computer vision research has seen tremendous success in recent times . This success can be attributed to recent breakthroughs in deep learning technology and such systems have been shown to achieve super human performance on several academic datasets. Driven by this success, these systems are actively being deployed in several household and industrial applications like robotics. However, current systems perform poorly when deployed in the real world, a.k.a in-the-wild, as most of the assumptions made during the modeling stage are violated. For example, consider object detectors, they require clean data for training and they are not effective in detecting or rejecting novel categories not seen in the data.In this thesis, we systematically identify problems that arise in a typical learning setup, the input, model and the output, and propose effective solutions to mitigate them.
  • Item
    Decoding the Brain in Complex Auditory Environments
    (2022) Rezaeizadeh, Mohsen; Shamma, Shihab; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Humans have an exceptional ability to engage with sequences of sounds and extract meaningful information from them. We can appreciate music or absorb speech during a conversation, not like anything else on the planet. It is unclear exactly how the brain effortlessly processes these rapidly changing complex soundscapes. This dissertation explored the neural mechanisms underlying these remarkable traits in an effort to expand our knowledge of human cognition with numerous clinical and engineering applications. Brain-imaging techniques have provided a powerful tool to access mental representations' content and dynamics. Non-invasive imaging such as Electroencephalography (EEG) and Magnetoencephalography (MEG) provides a fine-grained dissection of the sequence of brain activities. The analysis of these time-resolved signals can be enhanced with temporal decoding methods that offer vast and untapped potential for determining how mental representations unfold over time. In the present thesis, we use these decoding techniques, along with a series of novel experimental paradigms, on EEG and MEG signals to investigate the neural mechanisms of auditory processing in the human brain, ranging from neural representation of acoustic features to the higher level of cognition, such as music perception and speech imagery. First, we reported our findings regarding the role of temporal coherence in auditory source segregation. We showed that the perception of a target sound source can only be segregated from a complex acoustic background if the acoustic features (e.g., pitch, location, and timbre) induce temporally modulated neural responses that are mutually correlated. We used EEG signals to measure the neural responses to the individual acoustic feature in complex sound mixtures. We decoded the effect of attention on these responses. We showed that attention and the coherent temporal modulation of the acoustic features of the target sound are the key factors that induce the binding of the target features and its emergence as the foreground sound source. Next, we explored how the brain learns the statistical structures of sound sequences in different musical contexts. The ability to detect probabilistic patterns is central to many aspects of human cognition, ranging from auditory perception to the enjoyment of music. We used artificially generated melodies derived from uniform or non-uniform musical scales. We collected EEG signals and decoded the neural responses to the tones in a melody with different transition probabilities. We observed that the listener's brain only learned the melodies' statistical structures when derived from non-uniform scales. Finally, we investigated brain processing during speech and music imagery with Brain-Computer Interface applications. We developed an encoder-decoder neural network architecture to find a transformation between neural responses to the listened and imagined sounds. Using this map, we could reconstruct the imagery signals reliably, which could be used as a template to decode the actual imagery neural signals. This was possible even when we generalized the model to unseen data of an unseen subject. We decoded these predicted signals and identified the imagined segment with remarkable accuracy.
  • Item
    Demystifying Monaural Speech Segregation Models
    (2022) Parikh, Rahil; Shamma, Shihab A; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The ‘cocktail party problem’ is the task of attending to a source of interest, usually speech, in a complex acoustic environment with concurrent sounds. Despite the apparent ease with which humans can group acoustic cues from such an environment and organize them to meaningfully perceive them, the complexity of this problem has inspired generations of neuroscientists, psychologists and engineers to develop multi-disciplinary solutions to this problem, ranging from biologically- inspired frameworks to strictly engineering solutions. In this dissertation we first explore the biologically plausible ‘Temporal Coherence’ algorithm to perform monaural source segregation based on the timing cues of each speaker. This approach integrates biologically plausible feature extraction and hypotheses of sound object perception with current trends in deep learning. It focuses on speech segregation and de-noising in an unsupervised and online fashion. Our findings suggest that this framework is suitable for de-noising applications but is unreliable for segregating mixtures of speech in its current setting. We then explore the recent advancements in deep learning which have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN) based models- Conv-TasNet and DPT-Net. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered. We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3% harmonic jitter degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. Training the model on inharmonic speech does not remedy this sensitivity, instead resulting in worse performance on natural speech mixtures, making inharmonicity a powerful adversarial factor in DNN models. Furthermore, additional analyses reveal that DNN algorithms deviate markedly from the biologically inspired Temporal Coherence algorithm. Knowing that harmonicity is a critical cue for these networks to group sources we then perform a thorough investigation on ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture. We perform ablation studies where we apply low-pass, high-pass, and band-stop filters of varying pass-bands to empirically analyze the harmonics most critical for segregation. We also investigate how these networks decide which output channel to assign to an estimated source by introducing discontinuities in synthetic mixtures. We find that end-to-end networks are highly unstable, and perform poorly when confronted with deformations which are imperceptible to humans. Replacing the encoder in these networks with a spectrogram leads to lower overall performance, but much higher stability. This work helps us to understand what information these network rely on for speech segregation, and exposes two sources of generalization-errors. It also pinpoints the encoder as the part of the network responsible for these generalization-errors, allowing for a redesign with expert knowledge or transfer learning. The work in this dissertation helps demystify end-to-end speech segregation networks and takes a step towards solving the cocktail-party-problem.
  • Item
    RELIABLE MACHINE LEARNING: ROBUSTNESS, CALIBRATION, AND REPRODUCIBILITY
    (2021) Liu, Chihuang; JaJa, Joseph; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Modern machine learning (ML) algorithms are being applied today to a rapidly increasing number of tasks that affect the lives and well-being of people across the globe. Despite all the successes of artificial intelligence (AI), these methods are not always reliable and in fact are often quite brittle. It has been shown that a wide range of recent ML algorithms are vulnerable to adversarial attacks, and are over-confident even when they are not accurate. In this dissertation, we focus on the overall goal of making machine learning algorithms more reliable in terms of adversarial robustness, confidence calibration, and reproducibility. In the first part of the thesis, we explore novel approaches to improve the adversarial robustness of a deep neural network. We present a method that involves feature regularization and attention-based feature prioritization to motivate the model to only learn and rely on robust features that are not manipulated by the adversarial perturbation. We show that the resulting model is significantly more robust than other existing methods. In the second part of the thesis, we discover that the current training scheme of using one-hot labels under cross-entropy loss is a major cause of the over-confident behavior of deep neural networks. We propose a generalized definition of confidence calibration that requires the entire output to be calibrated. This approach leads to a novel form of the smooth labeling algorithm, called \textit{class-similarity based label smoothing}, which tries to approximate a distribution that is optimal for generalized confidence calibration. We show that a model trained with the proposed smooth labels is significantly better calibrated than all existing methods. In the third part of the thesis, we propose an approach that can improve the calibration performance of robust models. We first learn a representation space using prototypical learning which bases its classification on the distances between the representation of a sample and the representations of each class prototype. We then use the distance information to train a confidence prediction network to encourage the model to make calibrated predictions. We demonstrate through extensive experiments that our method can improve the calibration performance of a model while maintaining comparable accuracy and adversarial robustness levels. In the fourth part of the thesis, we tackle the problem of determining reproducible, large-scale functional patterns for the whole brain from a group of fMRI subjects. Because of the non-linear nature of the signals and significant inter-subject variability, how to reliably extract patterns that are reproducible across subjects has been a challenging task. We propose a group-level model, called LEICA, that uses Laplacian eigenmaps as the main data reduction step to preserve the correlation information in the original data as best as possible in a certain rigorous sense. The nonlinear map is robust relative to noise in the data and inter-subject variability. We show that LEICA detects functionally cohesive maps that are much more reproducible than the state-of-the-art methods.