Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 1112
  • Item
    (2023) Atrey, Pranjal; Dutta, Sanghamitra; Wu, Min; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Consumers’ reliance on product reviews and ratings has been making substantial impacts on purchasing behaviors in e-commerce. However, the relationship between reviews and ratings has received limited attention. For instance, a product may have a high rating but average reviews. Such feedback can cause confusion and uncertainty about the products, leading to decreased trust in the product. This thesis carries out a natural-language based machine learning study to analyze the relationship from e-commerce big data of product reviews and ratings. Towards answering this relationship question using natural-language-processing (NLP), we first employ data-driven sentiment analysis to obtain a numeric sentiment score from the reviews, which are then used for studying the correlation with actual ratings. For sentiment analysis, we consider the use of both glass-box (rule-based) and black-box opaque (BERT) models. We find that while the black-box model is more correlated with product ratings, there are interesting counterexamples where the sentiment analysis results by the glass-box model are better aligned with the rating. Next, we explore how well ratings can be predicted from the text reviews, and if sentiment scores can further help improve classification of reviews. We find that neither opaque nor glass- box classification models yield better accuracy, and classification accuracy mostly improves when BERT sentiment scores are augmented with reviews. Furthermore, to understand what different models use to predict ratings from reviews, we employ Local Interpretable Model- Agnostic Explanations (LIME) to explain the impact of words in reviews on the decisions of the classification models. Noting that different models can give similar predictions, which is a phenomenon known as the Rashomon Effect, our work provides insights on which words actually contribute to the decision-making of classification models, even in scenarios where an incorrect classification is made.
  • Item
    (2023) Rahman, Tahmid Sami; Srinivasan, Kartik; Waks, Edo; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Improving coupling between integrated photonics chips and optical fibers is an important topic of study for many applications. For photonic integrated circuits, different coupling methods have been implemented including edge coupling, grating coupling and 3D-integration using direct laser writing. Silicon nitride is a widely proven material for non linear optical phenomena such as frequency comb, optical parametric oscillation etc. Here in this thesis, coupling mechanisms based on direct laser writing are presented for use in nonlinear integrated photonics. Simulation works show that a polymer tapered coupler printed on a single mode fiber could be a good alternative to a cleaved fiber and equivalent to a lensed fiber. It is also shown that an out-of-plane polymer coupler on a silicon nitride access waveguide could be a prospective alternative for coupling to nonlinear integrated photonic circuits while avoiding chip separation and facet polishing. Both mechanisms could be good coupling options for shorter wavelength applications.
  • Item
    Design and Optimization of 5G and Beyond Hybrid Communication Systems
    (2023) Torkzaban, Nariman; Baras, John JB; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    5G and beyond communication systems are envisaged to fulfill three key promises that enable novel use cases and applications such as telemedicine, augmented reality/virtual reality (AR/VR), smart manufacturing, autonomous vehicles (AVs), etc. These three key promises are i) Enhanced mobile broadband (eMBB), ii) Ultra-reliable low latency Communications (URLLC), and iii) Massive machine-type communications (mMTC). In other words, 5G is required to achieve key performance indicators (KPIs) in terms of low latency, massive device connectivity, consistent quality of service (QoS), and high security. For instance, user bit-rates up to 10 Gbps and round-trip times (RTTs) as small as 1–10 ms are demanded in specific application scenarios in 5G. Toward achieving the 5G key promises, it is essential to utilize the capacity of all sorts of communications networks (terrestrial, space, aerial) and supporting technologies (SDN, NFV, etc.) simultaneously, leading to the so-called hybrid communication networks as opposed to the traditional stand-alone ones. This signifies the importance of a seamless integration and configuration policy tailored to specific use cases and QoS requirements of 5G and beyond services and will spawn several challenging design and optimization problems from the control and management to the physical layer of next-generation systems. In this thesis, we will address such critical problems in the course of 9 chapters. In the second chapter, we study the benefits of incorporating trust into decision-making for resource provisioning in next-generation communications networks. In this regard, we study the trust-aware service chain embedding problem for enhancing the reliability of virtual network function (VNF) placement on the trusted infrastructure. The problem of placing the VNFs onthe NFV infrastructure (NFVI) and establishing the routing paths between them, according to the service chain template, is termed SFC embedding. The objectives and constraints for the optimization problem formulation of SFC embedding may vary depending on the corresponding network service. We introduce the notion of trustworthiness as a measure of security in SFC embedding and thus network service deployment. We formulate the resulting trust-aware SFC embedding problem as a Mixed Integer Linear Program (MILP). We relax the integer constraints to reduce the time complexity of the MILP formulation and obtain a Linear Program (LP). We investigate the trade-offs among the two formulations, seeking to strike a balance between results accuracy and time complexity. The space-air-ground integrated network (SAGIN) offers potential benefits that are not possible otherwise, including global coverage, low latency, and high reliability. On the other hand, the heterogeneity of the integrated network with non-unified interfaces, and the diversity of 5G use cases with large-scale applications highlight the need for a unified management structure and a dynamic resource allocation policy that are both scalable and flexible enough to handle the increasing complexity. In the third chapter, on one hand, we optimize the integration of the hybrid network by deployment of satellite gateways on the ground segment of the network to ensure proper connection between the layers with minimum latency, and on the other hand, we aim at providing a seamless management and control scheme for the hybrid network utilizing the capacities of the supportive technologies, software-defined networking (SDN) and network function virtualization (NFV); In particular, we study the problem of SDN controller placement with the goal maximizing the reliability of the hybrid network. In the fourth chapter, we propose trust as a metric to measure the trustworthiness of the FL agents and thereby enhance the security of the FL training. We first elaborate on trust as a security metric by presenting a mathematical framework for trust computation and aggregation within a multi-agent system. We then discuss how this framework can be incorporated within an FL setup introducing the trusted FL algorithm for both centralized and decentralized FL. Next, we propose a framework for decentralized FL in UAV-enabled networks which involves the placement of the UAVs while ensuring the connectivity of the network of deployed UAVs. We dedicate the remaining chapters to studying the novel design problems and the key technologies for the physical layer of next-generation wireless systems with an emphasis on millimeter-wave communications, massive MIMO, and hybrid beamforming. We introduce a novel antenna configuration called twin-ULA (TULA) and its composite configurations to generate sharp beams with maximal and uniform gain. We introduce a novel beam alignment technique to maximize the utility of transmission in the presence of multipath, efficiently utilize reconfigurable intelligent surfaces (RIS) to enhance mmWave coverage in urban environments, and synchronize and calibrate in distributed massive MIMO networks for 6G systems, where the synchronization involves the carrier frequency offset estimation and compensation, and the calibration involves mitigating reciprocity mismatches in digital and analog RF chains of the access points (APs) implementing hybrid beamforming, enabling efficient downlink channel estimation.
  • Item
    (2023) Shi, Guangyao; Tokekar, Pratap; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With the rapid improvement in perception and planning technology, robots are being increasingly used as smart, adaptive sensors to gather information in applications such as environment monitoring, infrastructure inspection, and security and surveillance. To fully exploit the potential offered by robotic sensing, we need efficient and reliable decision-making techniques to decide when, where, and how to gather information. Such decision-making techniques need to account for the uncertainty and partial knowledge inherent in the working environment. The goal of this dissertation is to design algorithms to enable a multi-robot team to collectively and efficiently gather information on spatiotemporal fields without full knowledge of the environment. Our contributions span the full spectrum of the knowledge of the environmental conditions: from one extreme where the environmental model is fully known to the other extreme where the environmental model is unknown but can be learned from empirical data. We present several efficient (i.e., polynomial time) and effective (i.e., optimal or bounded approximation guarantees) algorithms for multi-robot information gathering. In the first part of the dissertation, we study coordination algorithms when the environmental model is fully or partially known. Specifically, for the case where the environmental model is fully known, we consider the challenge imposed by the connectivity requirement of the team. We present an algorithm for connectivity-constrained submodular maximization for information gathering that requires intermittent communication among the robotic team. For the case where the environment is partially known, and uncertainty exists, we seek to make the multi-robot team robust to the possible failures caused by the uncertainty. When the uncertainty is upper-bounded, we present a constant-factor approximation algorithm for robust multiple-path submodular orienteering. When the uncertainty is stochastic, and the distribution is known, we introduce two risk-sensitive coordination problems for aerial-ground long-term information gathering. In the second part of the dissertation, we study the case where the environmental model is initially unknown and needs to be learned from the data. Classically, such a learning process is independently conducted without considering the downstream task. By contrast, we present a framework that incorporates the downstream decision-making problem into the learning process. Such integration will help reduce the misalignment between the prediction model and the downstream task. The misalignment refers to a predictor that despite achieving high predictive accuracy in the learning phase may not necessarily result in good decisions in the downstream task. The general methodology to achieve such integration is tomake the combinatorial optimization differentiable, which then can be treated as a differentiable module in the learning process. In addition to algorithm design, we present empirical results for applications such as active target tracking, ocean monitoring, and persistent monitoring.
  • Item
    Leveraging Deep Generative Models for Estimation and Recognition
    (2023) PNVR, Koutilya; Jacobs, David W.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Generative models are a class of statistical models that estimate the joint probability distribution on a given observed variable and a target variable. In computer vision, generative models are typically used to model the joint probability distribution of a set of real image samples assumed to be on a complex high-dimensional image manifold. The recently proposed deep generative architectures such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and diffusion models (DMs) were shown to generate photo-realistic images of human faces and other objects. These generative models also became popular for other generative tasks such as image editing, text-to-image, etc. As appealing as the perceptual quality of the generated images has become, the use of generative models for discriminative tasks such as visual recognition or geometry estimation has not been well studied. Moreover, with different kinds of powerful generative models getting popular lately, it's important to study their significance in other areas of computer vision. In this dissertation, we demonstrate the advantages of using generative models for applications that go beyond just photo-realistic image generation: Unsupervised Domain Adaptation (UDA) between synthetic and real datasets for geometry estimation; Text-based image segmentation for recognition. In the first half of the dissertation, we propose a novel generative-based UDA method for combining synthetic and real images when training networks to determine geometric information from a single image. Specifically, we use a GAN model to map both synthetic and real domains into a shared image space by translating just the domain-specific task-related information from respective domains. This is connected to a primary network for end-to-end training. Ideally, this results in images from two domains that present shared information to the primary network. Compared to previous approaches, we demonstrate an improved domain gap reduction and much better generalization between synthetic and real data for geometry estimation tasks such as monocular depth estimation and face normal estimation. In the second half of the dissertation, we showcase the power of a recent class of generative models for improving an important recognition task: text-based image segmentation. Specifically, large-scale pre-training tasks like image classification, captioning, or self-supervised techniques do not incentivize learning the semantic boundaries of objects. However, recent generative foundation models built using text-based latent diffusion techniques may learn semantic boundaries. This is because they must synthesize intricate details about all objects in an image based on a text description. Therefore, we present a technique for segmenting real and AI-generated images using latent diffusion models (LDMs) trained on internet-scale datasets. First, we show that the latent space of LDMs (z-space) is a better input representation compared to other feature representations like RGB images or CLIP encodings for text-based image segmentation. By training the segmentation models on the latent z-space, which creates a compressed representation across several domains like different forms of art, cartoons, illustrations, and photographs, we are also able to bridge the domain gap between real and AI-generated images. We show that the internal features of LDMs contain rich semantic information and present a technique in the form of LD-ZNet to further boost the performance of text-based segmentation. Overall, we show up to 6% improvement over standard baselines for text-to-image segmentation on natural images. For AI-generated imagery, we show close to 20% improvement compared to state-of-the-art techniques.
  • Item
    Advances in the Application of Superconducting and Photonic Circuits to Microwave Radiometers
    (2023) Turner, Charles Josiah; Murphy, Thomas E; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In the fields of radio astronomy and remote sensing, there are application-driven requirements for wideband radiometers, hyperspectral spectrometers, and Radio Frequency Interference (RFI) mitigation. This work investigates the implementation of superconducting filters for RFI mitigation in ground-based radio astronomy where cryogenic cooling is available. It also explores the feasibility of implementing Photonic Integrated Circuits (PICs) in spaceborne radiometers. Spaceborne instruments have strict size, weight, and power consumption (SWaP) requirements. PICs are intrinsically wideband and offer significant SWaP benefits for enhanced performance in radiometers. This thesis presents three topics in technology development for the advancement of radiometers. The first topic is the development of a thin-film, high-temperature superconductor (HTS) notch filter to reject a local, high-power, RFI signal. The resonator topology was devised to minimize the necessary coupling between the transmission line and resonators. As demonstrated through measurements, this filter has an operating frequency range of 2-12 GHz and provides over 50 dB of rejection around 9.41 GHz. The measured maximum insertion loss is 0.6 dB in the lower pass-band and 2 dB in the upper pass-band, which can be reduced through improved packaging and operating the device at lower temperatures. This device currently demonstrates the largest 50-dB-rejection stop-band reported in literature for thin-film HTS filters at 4.3% fractional bandwidth. The second topic is a stochastic, non-linear, power-response model with supporting laboratory measurements for a photonics-enabled, heterodyne, microwave radiometer. The measurements are taken from a single-channel test device and the results can be applied to improve the design and simulation accuracy of a multi-channel spectrometer. This model is tested by comparing the measured gain of a photonic down-converter (PDC) under an applied continuous wave microwave signal versus an adjustable microwave noise source. The PDC consists of a dual-drive Mach Zehnder Modulator with a microwave local oscillator (LO) used for down-conversion of the microwave carrier signal. Using these results, the dynamic range of the proposed instrument is quantified with improved accuracy. The third topic is the demonstration of thermo-reflectance microscopy (TRM) on a polymer-based photonic device. A spaceborne, photonics-enabled, microwave radiometer needs to survive and operate in a space environment. Measuring the thermal profile of PICs is essential for creating more environmentally-robust designs, but many feature sizes fall below the diffraction limit for traditional infrared thermography. TRM offers a means of measuring thermal profiles by using visible-wavelength light to reduce the diffraction limit and achieving sub-micron spatial resolutions. Photonic Wire Bond (PWB) is an important component for coupling different PICs without requiring active optical alignment between chips. Although TRM has been tested before with semiconductors, it has not been demonstrated before on PWB. These results demonstrate the possibility of using TRM to test complete, multi-material PIC devices.
  • Item
    (2023) H P Elapatha Rajapaksha Siriwardena, Yashish Maduwantha; Espy-Wilson, Carol; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Acoustic-to-articulatory speech inversion involves the challenging task of deducing the kinematic state of various constriction synergies, including the lips, tongue tip, tongue body, velum, and glottis, based on their respective constriction degree and location coordinates. These coordinates are referred to as vocal tract variables (TVs). Developing Speech Inversion (SI) systems have gained attention over the recent years mainly due to its potential in a wide range of speech applications like Automatic Speech Recognition (ASR), speech synthesis, speech therapy, and mental health assessments. Over the past few years, deep neural network (DNN) based models have propelled the development of SI systems to new heights. However, the current SI systems still struggle with the lack of sufficiently larger articulatory datasets, speaker dependence, poor performance with noisy speech, and the lack of generalizability across different articulatory datasets. Moreover, one of the major drawbacks of the existing articulatory datasets is the lack of ground-truth data capturing velar and glottal activity of speech. With this work, we try to address some of the aforementioned challenges pertaining to the development of effective SI systems. Our experiments are based on two publicly available articulatory datasets; the University of Wisconsin X-ray microbeam (XRMB) dataset, and the HPRC dataset. We show that the use of appropriate audio augmentation techniques to synthetically create data can further improve the performance of SI systems both on clean and noisy speech data. We also show that the use of multi-task learning frameworks to carry out an auxiliary, but a related task can also improve the TV prediction. A key improvement came about when the SI systems were forced to learn source features (aperiodicity, periodicity, and pitch) as additional targets. Moreover, the use of self-supervised speech representations (HuBERT) and fine tuning them to the downstream task of speech inversion resulted in improved performance. With the aim of extending the current SI systems to estimate velar and glottal activity, data from an ongoing data collection was used to derive and validate two parameters; nasalance to capture velar constriction degree and electroglottography (EGG) envelope to capture voicing. A separate speaker-independent SI system was subsequently trained to estimate the derived parameters and is one of the first systems to achieve the feat. This SI system along with the conventional SI systems (trained to estimate lip and tongue TVs), provide a framework to estimate a complete articulatory representation of speech in speaker-interdependent fashion. While improving and extending the current SI frameworks, we also explored an unsupervised learning algorithm inspired by sensorimotor interactions in the human brain to perform audio and speech inversion. The proposed “MirrorNet”, a constrained autoencoder architecture is first used to learn, in an unsupervised manner, the controls of an off-the-shelf audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set of parameters to approximate renditions of complex piano melodies generated by a different synthesizer. To extend the same idea of learning to vocal tract controls for speech, we developed a DNN based articulatory synthesizer (articulatory-to-acoustic forward mapping) to be incorporated as the motor plant of the MirrorNet. The MirrorNet with this motor plant, once initialized with a minimal amount of ground-truth data (~ 30 mins of speech), can learn the articulatory representations (6 TVs + source features) with significantly better accuracy. Overall, this highlights the effectiveness and power of the MirrorNet’s learning algorithm in enabling to solve the conventional acoustic-to-articulatory speech inversion problem with minimal use of ground-truth articulatory data. In order to assess the practical utility of articulatory representations in real-world scenarios, we employed articulatory coordination features derived from TVs to detect and analyze articulatory-level alterations in the speech of individuals with schizophrenia. We show that the schizophrenia subjects with strong positive symptoms (e.g. hallucinations and delusions), and who are markedly ill, pose a more complex articulatory coordination pattern in facial and speech gestures compared to healthy controls. This distinction in speech coordination pattern is used to train a multimodal convolutional neural network (CNN) which uses video and audio data to distinguish schizophrenia subjects from healthy controls. Furthermore, we used TVs estimated by the best performing SI system to detect mispronunciation of \ɹ\, a common speech sound disorder in children. The classification model trained with TVs performed better compared to the state-of-the-art hand-crafted age-and-sex normalized formants. In essence, the work in this dissertation presents steps taken towards developing effective acoustic-to-articulatory speech inversion frameworks, and highlights the importance of utilizing articulatory representations in real-world applications.
  • Item
    (2023) Karunathilake , I.M Dushyanthi; Simon, Jonathan Z.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Speech communication requires real-time processing of rapidly varying acoustic sounds across various speech landmarks while recruiting complex cognitive processes to derive the intended meaning. Behavioral studies have highlighted that speech comprehension is altered by factors like aging, linguistic content, and intelligibility, yet the systematic neural mechanisms underlying these changes are not well understood. This thesis aims to explore how the neural bases are modulated by each of these factors using three different experiments, by comparing speech representation in the cortical responses, measured by Magnetoencephalography (MEG). We use neural encoding (Temporal Response Functions (TRFs)) and decoding (reconstruction accuracy) models which describe the mapping between stimulus features and the cortical responses, which are instrumental in understanding cortical temporal processing mechanisms in the brain.Firstly, we investigate age-related changes in timing and fidelity of the cortical representation of speech-in-noise. Understanding speech in a noisy environment becomes more challenging with age, even for healthy aging. Our findings demonstrate that some of the age-related difficulties in understanding speech in noise experienced by older adults are accompanied by age-related temporal processing differences in the auditory cortex. This is an important step towards incorporating neural measures to both diagnostic evaluation and treatments aimed at speech comprehension problems in older adults. Next, we investigate how the cortical representation of speech is influenced by the linguistic content by comparing neural responses to four types of continuous speech-like passages: non-speech, non-words, scrambled words, and narrative. We find neural evidence for emergent features of speech processing from acoustics to linguistic processes at the sentential level as incremental steps in the processing of speech input occur. We also show the gradual computation of hierarchical speech features over time, encompassing both bottom-up and top-down mechanisms. Top-down driven mechanisms at linguistic level demonstrates N400-like response, suggesting involvement of predictive coding mechanisms. Finally, we find potential neural markers of speech intelligibility using a priming paradigm, where intelligibility is varied while keeping the acoustic structure constant. Our findings suggest that segmentation of sounds into words emerges with better speech intelligibility and most strongly at ~400 ms in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, this thesis furthers our understanding on neural mechanisms underlying speech comprehension and potential objective neural markers to evaluate the level of speech comprehension.
  • Item
    (2023) Zhu, Guozhen; Liu, K. J. Ray; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Environmental perception is pivotal for intelligent systems, enabling them to adeptly capture, interpret, and act upon contextual cues. Grasping the intricacies of the environment—its objects, occupants, floor plan, and dynamics—is fundamental for the effective deployment of technologies, including robotics, the Internet of Things (IoT), and augmented reality. Traditional perception mechanisms, such as video surveillance and sensor-based monitoring, are often hampered by privacy concerns, substantial infrastructural costs, energy inefficiencies, and limited coverage. In contrast, WiFi sensing stands out for its non-intrusive, cost-effective, and pervasive attributes. Capitalizing on ubiquitous WiFi signals that permeate both indoor and outdoor spaces, WiFi sensing delivers unparalleled advantages over its traditional counterparts, sidestepping the need for extra hardware yet offering profound environmental insights. Its capability to penetrate walls and other obstructions further broadens its range, covering areas beyond the reach of conventional sensors. These unique edges of WiFi sensing elevate its value across diverse applications, spanning smart homes, health monitoring, location-based services, and security systems. Amplifying environmental perception via WiFi sensing is more than just an innovation in ubiquitous computing; it's a leap towards forging safer, more efficient, and smarter environments. This dissertation explores monitoring and mapping environments leveraging motion analytics based on commodity WiFi. In the first part of this dissertation, we introduce an efficient and cost-effective system for precise floor plan construction by integrating RF and inertial sensing techniques. The proposed system harnesses detailed insights from RF tracking and broad context from inertial metrics, such as magnetic field strength, to produce an accurate map. The system employs a robot for trajectory collection and requires only a single Access Point to be arbitrarily installed in space, both of which are widely available nowadays. Impressively, the system can produce detailed maps even with minimal data, making it adaptable for diverse structures such as shopping centers, offices, and residences without significant expenses. We validated the efficacy of the proposed system using a Dji RoboMaster S1 robot equipped with standard WiFi across three distinct buildings, demonstrating its capability to produce reliable maps for the intended regions. Given the widespread presence of WiFi setups and the increasing prevalence of domestic robots, the proposed approach paves the way for universal intelligent systems offering indoor mapping services. In the second and third parts, we present two innovative strategies leveraging WiFi to identify the motion of human and various non-human subjects. Initially, we detail a novel passive, non-intrusive methodology tailored for edge devices. By extracting and analyzing motion's physically and statistically plausible features, our system recognizes human and diverse non-human subjects through walls using a singular WiFi link. Experimental results from four distinct buildings with various moving subjects validate its efficiency on edge devices. Advancing to more intricate cases, we put forth a deep learning-based WiFi sensing paradigm. This delves into the efficacy of diverse deep learning models on human and non-human object recognition and probes the feasibility of transferring image-trained models to fulfill the WiFi sensing task. Designed with a robust statistic invariant to the environment and position, this system efficiently adapts to new surroundings. Comprehensive experimental evaluations affirm our framework's precision in pinpointing intricate human and non-human subjects, and readiness for integration into prevalent intelligent systems, thereby boosting their perceptual capacities. In the final part of this dissertation, we propose a pioneering through-wall indoor intrusion detection system that adeptly filters out interference from non-human subjects using ubiquitous WiFi signals. A novel deep learning architecture is proposed for single-link WiFi signal analysis. It employs a ResNet-18-based module to extract features of indoor moving subjects and an LSTM-based module to incorporate temporal information for efficient intrusion detection. Notably, the system is invariant to environmental changes, angles, and positions, enabling swift deployment in new environments without additional training. Evaluation in five indoor environments with various interference yielded high intrusion detection accuracy and a low false alarm rate, even without model tuning for unseen settings. The results underscore the system's exceptional adaptability, positioning it as a top contender for widespread intelligent indoor security applications.
  • Item
    (2023) Singhabahu, Chanaka Manoj; Khaligh, Alireza; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Single-stage power conversion with high frequency transformer isolation has gained interest as a key enabler in improving the efficiency and power density of electrical systems. Traditionally, the power conversion from one voltage level to multiple voltage levels is performed using discrete modules of AC-DC and/or DC-DC converters that meet the isolation requirements. Due to low level of integration in terms of high frequency magnetic link and driving power electronic converters, such solutions suffer from large volume/weight and low efficiencies. Regardless, such multiple input/output converter architectures are extensively used in a wide range of applications, including electric vehicles, DC smart homes, data centers and personal computers. While these architectures are realized using a combination of interconnected discrete power converters, this Ph.D. dissertation presents a multi-port energy router which is capable of integrating multiple systems with different voltage levels, resulting in substantial improvements in power density and efficiency. The proposed energy router employs multi-active-bridge (MAB) converter derived topologies as the fundamental building blocks to create an electrically and magnetically integrated, scalable, single-stage, power electronic converter which can be extended to n-ports. Several key challenges that have impeded the use of MAB converters are investigated in detail. The estimation of the optimal modulation parameters of an MAB converter is vital for achieving desired converter performance. The accurate modeling of the high frequency ac-link plays a major role in determining modulation parameters due to sophisticated magnetic coupling relationships. As the first contribution of this dissertation, a full-order n x n impedance matrix-based model which captures all the coupling information of the magnetic link is used to obtain desired power flow, minimize conduction losses, and analyze zero-voltage-switching (ZVS) conditions of the MAB converter topology. A frequency domain model of the MAB converter is developed which uses the impedance matrix to solve for port currents. Subsequently, the proposed model is used to formulate a constrained numerical optimization routine to find the optimal modulation parameters, which minimizes the conduction and switching losses. The inductance matrix of the high-frequency ac-link is further used in conjunction with the frequency domain model to analyze the port ZVS conditions by investigating the port equivalent inductive energy in the high frequency ac-link. The broad range of operating points (port loading conditions and voltage levels) in an MAB converter presents a complex problem in the design of efficient and power-dense magnetic components. As such, it is not feasible to use traditional optimization approaches developed for two-winding transformers, due to the presence of a high number of design parameters, modulation variables, and the effect of the port loading conditions on the dynamic AC resistance and core losses. As the second contribution, comprehensive planar PCB-based magnetics are developed using a multi-objective design and optimization framework to realize a highly efficient and compact planar magnetic link for the MAB converter. As a key component of this framework, accurate and scalable analytical models for conduction and core loss estimation are developed, which capture loss mechanisms distinctive to multi-winding transformers. Using the proposed loss models, the design framework integrates multi-objective optimization methods for all magnetic components in the high-frequency link, namely, the multi-winding transformer and the series branch inductors. The proposed approach determines the optimal combination of magnetic core geometries, turns ratios, number of turns, branch inductances, and winding interleaving configuration, with the objectives of minimizing the operating point weighted-efficiency drop and the magnetic volume. Finally, a Pareto-optimal magnetic link design is selected. The proposed concepts of obtaining optimal modulation parameters and the design of high frequency planar magnetic link are validated using comprehensive circuit and finite-element-analysis (FEA) simulations. The experimental verification is performed on a Gallium Nitride based 4-port 1-kW DC-DC MAB converter with its ports rated at 420V, 48V, 24V, 12V. With the modeling, design and optimization methodologies obtained from the above two works, a new family of MAB derived converter topologies with AC ports is proposed as the third contribution of this dissertation. Particularly, the single-stage power conversion between DC and three-phase AC is investigated. The operating principles of the proposed topologies are discussed in detail along with systematic modeling and optimal modulation methods by using the concepts developed above for DC-DC MAB converters. The circuit operation is also investigated in terms of ZVS. To validate the topology configurations and the modulation methods, comprehensive Simulink simulation models are developed. Compared to traditional two-stage converter systems comprised of DC-DC and DC-AC stages, the proposed topologies provide multiple benefits in terms of single-stage power conversion, ZVS, high-efficiency and galvanic isolation.
  • Item
    (2023) Hsu, Wei-Lun; Dagenais, Mario; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In the first part of this thesis, we discuss the design and fabrication of arrayed waveguide gratings implemented on the Si3N4/SiO2 integration platform. Arrayed waveguide gratings (AWGs) are widely applied in telecommunication systems as multiplexers/demultiplexers and signal routers, as well as in optical sensing, quantum computing and spectroscopy. It is believed to be a promising solution to some major challenges in observational astronomy. Our AWG design and devices are based on 100-nm Si3N4 on SiO2 platform. Three-stigmatic-point (TSP) AWGs are designed and demonstrated to feature flat output image surface, which can be cleaved to apply cross-dispersion optics for astronomical observation. V-shaped and crossover structures have been introduced. A V-shaped structure is based on the structure used in Rsoft while the crossover structure overlaps two free spectral range (FSRs) and shorten the lengths of arrayed waveguides. For a lower resolving power design with V-shaped structure, the peak transmission reaches -1.9 dB and the highest resolving power is around 5,300. For the higher resolving power design with crossover structure, the peak transmission is -2.0 dB and the maximum resolving power goes above 18,000. The performance of all three input channels is very consistent despite prominent side lobes due to larger phase error. The degradation of resolving power within one FSR is only 6%.A cascaded AWG is developed to broaden the FSR without increasing the footprint. The design approach used in the project is a small primary AWG with broad FSR and multiple large secondary AWGs. However, the cascaded AWGs require a flat response for the primary AWG (flat-top AWG) to prevent large losses at the outer channels as well as at the position of the channel cross-points. The design of flat-top primary AWG is based on modifying the power profile along the input aperture and the phase distribution along the output aperture, creating a sinc function as input signal into the output FPR. A three stigmatic point (TSP) AWG is used as the secondary AWG for achieving better cross-dispersion performance. One-top-hat or two-top-hat layouts are utilized in the design. Experimental results demonstrate that the Rowland primary AWG has higher peak transmission but suffer significant loss at the channel cross-points while a flat-top primary AWG features slightly lower peak transmission but has a huge improvement at the channel cross-points, improving significantly in transmission by more than 12 dB experimentally. However, phase errors generate prominent side lobes and deteriorate the crosstalk. A cascaded AWG with a flat-top primary stage shows a flat output response within the passband, but Rowland primary AWG performs better in terms of filtering out unwanted signals outside the passband. In part 2 of this thesis, we present our work on realizing high performance perovskite based solar cells. A FAxMA(1-x)PbI3 perovskite solar cell with a tunable bandgap from 1.59 to 1.50 eV is proposed. A superstrate configuration with an inverted planar structure is adopted. The structure of our FAxMA(1-x)PbI3 perovskite solar cell is FTO glass/PTAA with m-MTDATA/Perovskite/PCBM/Ag. Sequential PTAA doping and solvent-assisted annealing techniques are used to improve the performance of FAxMA(1-x)PbI3 perovskite solar cell. SEM images clearly show that MAPbI3 (x=0) film has the highest degree of crystallinity with an average grain size over 2 m. As the FAI proportion increases, the degree of crystallinity decreases, resulting in smaller grain size. FA0.33MA0.67PbI3 perovskite material is the optimized ratio for single-junction solar cell and the corresponding power conversion efficiency (PCE) is 16.5%, with an open circuit voltage (Voc) of 1.02 V and a short-circuit current (Jsc) of 24.5 mA/cm2. A fill factor (FF) of 66% is extracted and it reflects a lower crystallinity. The external quantum efficiency (EQE) of FA0.33MA0.67PbI3 perovskite solar cell is measured to be above 90% of efficiency over a broad spectral range from 400 to over 600 nm and remains above 80% around 760 nm, and the absorption onset is pushed to 820 nm due to a lower optical bandgap of 1.54 eV. MAPbI3 solar cell with optical bandgap of 1.59 eV is a great fit as the top cell paired with copper indium selenide (CIS) bottom cell with bandgap of 1 eV. A four-terminal perovskite-CIS tandem solar cell is proposed. I-V characteristics and EQE are taken to investigate the performance. The champion cell demonstrates a PCE of 19.5% which improves the optimized single-junction FA0.33MA0.67PbI3 perovskite solar cell by 3%. If a freshly fabricated bottom CIS solar cell was used for tandem solar cell, the overall PCE would be expected to be above 20%.
  • Item
    (2023) Xie, Ti; Gong, Cheng; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Magnetism has played a crucial role in both fundamental research and technological advancement, from ancient compasses to modern spintronics. With the advent of artificial intelligence and the increasing demand for high-volume data storage, there have been significant efforts to reduce the dimensionality of memory materials. Recently, the discovery of two-dimensional magnetic van der Waals materials has enabled the observation of long-range magnetic order in monolayer crystals, which exhibit high sensitivity to external stimuli such as optical incidence, mechanical strain, and chemical functionalization. Our systematic work focuses on the efficient control of two-dimensional magnetism through multiple external stimuli, including chemical, optical, electrical, and mechanical means. These works achieved the effective control of a wide range of magnetic properties of two-dimensional magnets, such as Curie temperatures, magnetic coercivities, domain profiles, and magnetic phases. These research achievements will provide valuable insights into the fundamentals of two-dimensional magnetism and its interplay with external stimuli, paving the way for advancing the nanoscale spintronic and photonic devices in ultrathin platforms.
  • Item
    Methods and Tools for Real-Time Neural Image Processing
    (2023) Xie, Jing; Bhattacharyya, Shuvra; Chen, Rong; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As a rapidly developing form of bioengineering technology, neuromodulationsystems involve extracting information from signals that are acquired from the brain and utilizing the information to stimulate brain activity. Neuromodulation has the potential to treat a wide range of neurological diseases and psychiatric conditions, as well as the potential to improve cognitive function. Neuromodulation integrates neural decoding and stimulation. As one of the twocore parts of neuromodulation systems, neural decoding subsystems interpret signals acquired through neuroimaging devices. Neuroimaging is a field of neuroscience that uses imaging techniques to study the structure and function of the brain and other central nervous system functions. Extracting information from neuroimaging signals, as is required in neural decoding, involves key challenges due to requirements of real-time, energy-efficient, and accurate processing and for large-scale, high resolution image data that are characteristic of neuromodulation systems. To address these challenges, we develop new methods and tools for design andimplementation of efficient neural image processing systems. Our contributions are organized along three complementary directions. First, we develop a prototype system for real-time neuron detection and activity extraction called the Neuron Detection and Signal Extraction Platform (NDSEP). This highly configurable system processes neural images from video streams in real-time or off-line, and applies techniques of dataflow modeling to enable extensibility and experimentation with a wide variety of image processing algorithms. Second,we develop a parameter optimization framework to tune the performance of neural image processing systems. This framework, referred to as the NEural DEcoding COnfiguration (NEDECO) package, automatically optimizes arbitrary collections of parameters in neural image processing systems under customizable constraints. The framework allows system designers to explore alternative neural image processing trade-offs involving execution time and accuracy. NEDECO is also optimized for efficient operation on multicore platforms, which allows for faster execution of the parameter optimization process. Third, we develop a neural network inference engine targeted to mobile devices.The framework can be applied to neural network implementation in many application areas, including neural image processing. The inference engine, called ShaderNN, is the first neural network inference engine that exploits both graphics-centric abstractions (fragment shaders) and compute-centric abstractions (compute shaders). The integration of fragment shaders and compute shaders makes improved use of the parallel computing advantages of GPUs on mobile devices. ShaderNN has favorable performance especially in parametrically small models.
  • Item
    (2023) Han, Jinjing; Ghodssi, Reza; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Serotonin (5-hydroxytryptamine, 5-HT) plays a crucial role as a monoamine neurotransmitter, regulating various behavioral and physiological functions in the brain and peripheral systems. Its effects encompass emotions, behaviors, gastrointestinal motility, hemostasis, and cardiovascular function. Dysregulation of the serotonergic system and imbalances in 5-HT levels have been associated with psychiatric disorders, underscoring its potential as a biomarker for conditions like anxiety disorders, depression, Alzheimer's disease, and impulsive aggressiveness. However, the precise mechanisms by which 5-HT modulates these physiological conditions and behavioral processes remain unknown, necessitating the use of sensing tools to monitor 5-HT dynamics in specific locations. Traditional techniques such as high-performance liquid chromatography (HPLC) and enzyme-linked immunosorbent assay (ELISA) have been employed to measure 5-HT concentrations in biological samples. However, these offline methods only provide information at the end of an experiment and lack spatial and temporal resolution. Due to the rapid extracellular release and uptake of 5-HT, there is a clear need for detection techniques with high spatiotemporal resolution to investigate serotonergic modulations.This dissertation focuses on the development of minimally invasive neurochemical sensing systems to address challenges related to real-time 5-HT sensing and facilitate in vitro and in vivo investigation of serotonergic modulation. Two sensing systems were developed. For in vitro 5-HT sensing, surface-modified microelectrodes with single carbon fiber were developed and integrated with a portable potentiostat for point-of-care (POC) applications. These microelectrodes were tested for detecting in vitro cell-secreted 5-HT and 5-HT in homogenized crayfish nerve cord samples. The portable system exhibited a sensitivity of 74 nM/µM with a limit of detection (LOD) of 140 nM. Moreover, it was tested for detecting 5-HT in artificial urine, showcasing its application as a POC device for early diagnosis of 5-HT syndrome from urine tests. For in vivo 5-HT sensing, surface-modified microelectrodes with multiple carbon fibers were developed to enhance mechanical robustness specifically for in vivo applications. After integration with a miniature PCB, the device was able to co-detect dopamine (DA) and 5-HT at sub-micromolar concentrations with wireless communication. The integrated untethered implantable system demonstrated its capabilities for in vivo simultaneous monitoring of DA and 5-HT in freely moving crayfish during injection events. Overall, these developed systems offer electrochemical 5-HT sensing solutions for both in vitro and in vivo applications, providing reliable tools to obtain real-time 5-HT dynamics information with high spatial resolution. This capability significantly enhances our ability to investigate precise 5-HT signaling and mechanism underlying serotonergic modulation in the disorder development and behavioral processes.
  • Item
    Microwave Nonlinearities in Photodiodes
    (1994) Williams, Keith Jake; Dagenais, Mario; Electrical & Computer Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, MD)
    The nonlinearities in p-i-n photodiodes have been measured and numerically modeled. Harmonic distortion, response reduction, and sinusoidal output distortion measurements were made with two singlefrequency offset-phased-locked Nd: YAG lasers, which provided a source dynamic range greater than 130 dB, a 1 MHz to 50 GHz frequency range, and optical powers up to 10 mW. A semi-classical approach was used to solve the carrier transport in a one-dimensional p-i-n photodiode structure. This required the simultaneous solution of three coupled nonlinear differential equations: Poisson's equation and the hole and electron continuity equations. Space-charge electric fields, loading in the external circuit, and absorption in undepleted regions next to the intrinsic region all contributed to the nonlinear behavior described by these equations. Numerical simulations were performed to investigate and isolate the various nonlinear mechanisms. It was found that for intrinsic region electric fields below 50 kV/cm, the nonlinearities were influenced primarily by the space-charge electric-field-induced change in hole and electron velocities. Between 50 and 100kV/cm, the nonlinearities were found to be influenced primarily by changes in electron velocity for frequencies above 5 GHz and by p-region absorption below 1 GHz. Above 100 kV/cm, only p-region absorption could explain the observed nonlinear behavior, where only 8 to 14 nm of undepleted absorbing material next to the intrinsic region was necessary to model the observed second harmonic distortions of -60 dBc at 1 mA. Simulations were performed at high power densities to explain the observed response reductions and time distortions. A radially inward component of electron velocity was discovered, and under certain conditions, was estimated to have the same magnitude as the axial velocity. The model was extended to predict that maximum photodiode currents of 50 mA should be possible before a sharp increase in nonlinear output occurs. For capacitively-limited devices, the space-charge-induced nonlinearities were found to be independent of the intrinsic region length, while external circuit loading was determined to cause higher nonlinearities in shorter devices. Simulations indicate that second harmonic improvements of 40 to 60 dB may be possible if the photodiode can be fabricated without undepleted absorbing regions next to the intrinsic region.
  • Item
    Cardiovascular Physiological Monitoring Based on Video
    (2023) Gebeyehu, Henok; Wu, Min; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Regular, continuous monitoring of the heart is advantageous to maintaining one’s cardiovascular health as it enables the early detection of potentially life-threatening cardiovascular diseases. Typically, the required devices for continuous monitoring are found in a clinical setting, but recent research developments have advanced remote physiological monitoring capabilities and expanded the options for continuous monitoring from home. This thesis focuses on further extending the monitoring capabilities of consumer electronic devices to motivate the feasibility of reconstructing Electrocardiograms via a smartphone camera. First, the relationship between skin tone and remote physiological sensing is examined as variations in melanin concentrations for people of diverse skin tones can affect remote physiological sensing. In this work, a study is performed to observe the prospect of reducing the performance disparity caused by melanin differences by exploring the sites from which the physiological signal is collected. Second, the physiological signals obtained from the previous part are enhanced to improve the signal-to-noise ratio and utilized to infer ECG as parts of a novel technique that emphasizes interpretability as a guiding principle. The findings in this work have the potential to enable and promote the remote sensing of a physiological signal that is more informative than what is currently possible with remote sensing.
  • Item
    Designing Optical Quantum Computing with Minimal Hardware
    (2023) Shi, Yu; Waks, Edo; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Photons, while indispensable for quantum communication and metrology, fall short due to limited photon-photon interactions, thus suboptimal for quantum computing. This thesis explores the use of an atom-photon interface to foster entanglement between photons, thereby facilitating more scalable optical quantum computing with reduced resource demands. I initially discuss the deterministic generation of multi-dimensional cluster states via an atom-photon interface and time-delay feedback. These cluster states are essential resources for fault-tolerant measurement-based quantum computing. A diagrammatic method is introduced to derive tensor networks of highly entangled states, thereby aiding in the simulation of states produced from sequential photons. Subsequently, I investigate the implementation of the optical quantum Fourier transform through the interface, which facilitates photon-photon interactions and significantly reduces the dependence on linear optical devices. In addition to devising techniques, I introduce an error metric for non-trace-preserving quantum operations that aligns with fault-tolerant quantum computing theory. This metric is beneficial for assessing errors across various quantum platforms and post-selected protocols. Overall, this research advances the field of optical quantum information processing, proposing scalable, practical solutions for quantum computing. Concurrently, it pioneers novel error metrics, providing a promising benchmarking and optimization strategy for robust quantum information processing.
  • Item
    Private Information Read-Update-Write with Applications to Distributed Learning
    (2023) Pallegoda Vithana, Sajani; Ulukus, Sennur; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Data privacy has gained significant interest in information theory with the emergence of a wide range of data-driven technologies in the recent past. Data privacy is compromised primarily when the data origins (private users) download or upload information. This dissertation focuses on how information-theoretic privacy of the users' data can be guaranteed when downloading and uploading information, along the lines of private information retrieval (PIR) and private read-update-write (PRUW), which have applications in privacy-preserved distributed learning. First, we consider the problem of semantic PIR, in which multiple non-colluding databases store a number of files, out of which a user desires to download one without revealing its index to any of the databases. Semantic PIR deviates from classical PIR by allowing the files to have arbitrary semantics such as different file sizes and arbitrary popularity profiles. As the main result of this work, we characterize the capacity of semantic PIR, with achievable schemes and a converse proof. We provide capacity results for semantic PIR with replicated databases, MDS coded databases and colluded databases. As PIR deals with private \emph{reading}, we consider private \emph{writing} in the second work, which is an immediate conceptual extension of PIR. In the problem of private read-update-write (PRUW), a user downloads, updates and writes back a specific section of a storage system while guaranteeing information-theoretic privacy of the index of the section updated and the values of the updates. PRUW has applications in distributed learning such as privacy-preserved federated learning (FL) with sparsification and private federated submodel learning (FSL). In FSL, a machine learning model is divided into multiple submodels based on different types of data used for training, and a given user only downloads and updates the submodel relevant to the user's local data. To guarantee the privacy of the users participating in the FSL process, both the updating submodel index and the values of the updates must be kept private from the server that stores the model. This is achieved by PRUW, where the required submodel is downloaded in the \emph{reading phase} without revealing the submodel index to the databases (similar to PIR), and the updates are sent back to the databases in the \emph{writing phase} without revealing the values of the updates or the submodel index. We provide a basic PRUW scheme to perform private FSL that achieves the lowest known total communication cost of private FSL thus far. In this work, we introduce the concept of combining multiple parameter updates into a single bit in terms of a Lagrange polynomial in such a way that it can be privately decomposed into the respective individual updates and added to the relevant positions at the server. Third, we consider the problem of private FSL with top $r$ sparsification, in which the user-server communications are significantly reduced by only sharing the most significant $r$ fractions of parameters and updates in the reading and writing phases. However, this introduces additional privacy requirements as the positions of the sparse updates/parameters leak information about the users' private data in addition to their values and the submodel index. To this end, we provide a PRUW scheme that performs top $r$ sparsification in FSL while guaranteeing the information-theoretic privacy of the updating submodel index, values of the sparse updates and the positions of the sparse updates/parameters using a permutation technique. Fourth, we study random sparsification in FSL, in which the user only downloads and uploads a specific fraction of randomly selected parameters and updates to reduce the communications. The problem is formulated in terms of a rate-distortion characterization, where we derive the minimum achievable communication cost for a given amount of allowed distortion. We show that a linear rate-distortion relation is achievable while guaranteeing the information-theoretic privacy of the updating submodel index, the values of the sparse updates and the positions of the sparse updates/parameters. Fifth, we extend the ideas of PRUW to FL with top $r$ sparsification. While the same permutation technique introduced in FSL with top $r$ sparsification is applicable to this setting, it incurs a significantly large storage cost for FL. To alleviate this, we modify the permutation technique in such a way that the storage cost is reduced at the expense of a certain amount of information leakage, using a model segmentation mechanism. In general, we provide the trade-off between the communication cost, storage cost and information leakage in private FL with top $r$ sparsification, along with achievable schemes with different properties. In all of the above PRUW settings, we require multiple non-colluding databases to store the central model to guarantee information-theoretic privacy of the users' local data. In the sixth work, we consider the practical scenario of private FSL, where the databases storing the central model are allowed to have arbitrary storage constraints. As the main result of this work, we develop a PRUW scheme and a storage mechanism for FSL that efficiently utilize the available space in each database to store the submodel parameters in such a way that the total communication cost is minimized while guaranteeing information-theoretic privacy of the updating submodel index and the values of the updates. The proposed storage mechanism is a hybrid of MDS coded storage and divided storage, which focuses on finding the optimum MDS codes and fractions of submodels stored in each database for any given set of homogeneous or heterogeneous storage constraints. Seventh, we go beyond privacy and consider deception in information retrieval. We introduce the problem of deceptive information retrieval (DIR) which is a conceptual extension of PIR. In DIR, the user downloads a required file out of multiple files stored in a system of databases and reveals information about a different file as what was required, to the databases. In other words, the user deceives the databases by making their prediction on the user-required file index incorrect with high probability. We propose a scheme to perform DIR that achieves a given required level of deception, with the goal of minimizing the download cost. The proposed scheme incurs higher download costs compared to PIR for positive levels of deception, and achieves the PIR capacity when the level of deception specified is zero.
  • Item
    Design Techniques for Embedded Computer Vision and Signal Processing
    (2023) Lee, Yaesop; Bhattacharyya, Shuvra; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we explore new design techniques to facilitate the implementation of efficient deep learning systems for embedded computer vision and signal processing. The techniques are developed to address concerns of real-time processing efficiency and energy efficiency under resource-constrained operation as well accuracy considerations, which are conventionally associated with the development of deep learning solutions. We study two specific application areas for efficient deep learning — (1) neural decoding and (2) object detection from multi-view images, such as those acquired from unmanned aerial vehicles (UAVs).To address the challenges of efficient deep learning systems, we apply dataflow-based methods for design and implementation of signal and information processing systems. Signal-processing oriented dataflow concepts provide an efficient computational model that allows flexibility and expandability to facilitate design and implementation of complex signal and information processing systems. In dataflow modeling, applications are modeled as directed graphs, called dataflow graphs, in which vertices (actors) correspond to discrete computations that are executed and edges represent communication between pairs of actors. In the first part of the thesis, we study in depth a recently-introduced model of computation, called passive-active flow graphs (PAFGs), which can be used in conjunction with dataflow modeling to facilitate more efficient implementation of dataflow graphs. In the second part of the thesis, we present the application of dataflow techniques to develop a novel system for real-time neural decoding. Neural decoding involves processing signals acquired from the brain — for example, through calcium imaging technology — to predict behavior variables. We refer to the developed system as the Neuron Detection and Signal Extraction Platform (NDSEP). NDSEP incorporates streamlined subsystems for neural decoding that are integrated efficiently through dataflow modeling. The dataflow- based software architecture of NDSEP provides modularity and extensibility to experiment with alternative modules for neural signal processing. Our system design also facilitates optimization of trade-offs between accuracy and real-time performance. Additionally, we explore various factors beyond dataflow-based system design to develop efficient deep learning systems for embedded computer vision and signal processing. In the third part of the thesis, we address the problem of limited training data, which is a significant problem for many application areas of embedded computer vision, especially areas that are highly specialized or are at the very forefront of computer vision technology. We address this problem specifically in the context of deep learning for object detection from multi-view images acquired from unmanned aerial vehicles (UAVs). To help overcome the shortage of relevant training data in this class of object detection scenarios, we introduce a new dataset and associated metadata, which integrates real and synthetic data to provide a much larger collection of labeled data than what is available from real data alone. We also apply the developed dataset to conduct comprehensive studies of how the critical attributes of UAV-based images affect machine learning models, and how these insights can be applied to advance the training and testing of the models. Moreover, in the fourth part of the thesis, we explore fundamental algorithm devel- opment for efficient object detection from multi-view images. In this work, we propose a simplified 2-dimensional object detection technique that can be implemented to leverage multiple images for a scene. This work provides a simple but effective way to extend the detection architecture for a single-view image to an architecture for multi-view images. A useful feature of the proposed approach is that it requires only a minimal amount of additional computation to extend an architecture from single- to multi-view operation. In the fifth part of the thesis, we develop a novel approach to online learning, called RONDO (Recursive Online Neural DecOding (RONDO)) framework, that is tailored for portable neural decoding systems, where computational resource constraints and energy efficiency are important concerns in addition to knowledge extraction accuracy. The characteristics of brain imaging signals may change significantly over time, making online learning an important tool for robust neural decoding. In online learning, the under- lying machine learning model is updated dynamically as new input is received by the system. In this work, we build upon the existing understanding gained from recurrent neural network (RNN) algorithms, and introduce a new RNN-based online learning framework for neural decoding that provides robust, energy-energy efficient neural decoding on resource-constrained platforms. RONDO provides novel trade-offs between neural decoding accuracy and energy consumption due to computationally-intensive retraining rounds that are needed to update the underlying RNN model when characteristics of the input signal change significantly.
  • Item
    Representation Learning For Large-scale Graphs
    (2023) Jin, Yu; JaJa, Joseph; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Graphs are widely used to model object relationships in real world applications such as biology, neuroscience, and communication networks. Traditional graph analysis tools focus on extracting key graph patterns to characterize graphs which are further used in the downstream tasks such as prediction. Common graph characteristics include global and local graph measurements such as clustering coefficient, global efficiency, characteristic path length, diameter and so on. Many applications often involve high dimensional and large-scale graphs for which existing methods, which rely on small numbers of graph characteristics, cannot be used to efficiently perform graph tasks. A major research challenge is to learn graph representations that can be used to efficiently perform graph tasks as required by a wide range of applications. In this thesis, we have developed a number of novel methods to tackle the challenges associated with processing or representing large-scale graphs. In the first part, we propose a general graph coarsening framework that maps large graphs into smaller ones while preserving important structural graph properties. Based on spectral graph theory, we define a novel distance function that measures the differences between graph spectra of the original and coarse graphs. We show that the proposed spectral distance sheds light on the structural differences in the graph coarsening process. In addition, we propose graph coarsening algorithms that aim to minimize the spectral distance, with provably strong bounds. Experiments show that the proposed algorithms outperform previous graph coarsening methods in applications such as graph classification and stochastic block recovery tasks. In the second part, we propose a new graph neural network paradigm that improves the expressiveness of the best known graph representations. Graph neural network (GNN) models have recently been introduced to solve challenging graph-related problems. Most GNN models follow the message-passing paradigm where node information is propagated through edges, and graph representations are formed by the aggregation of node representations. Despite their successes, message-passing GNN models are limited in terms of their expressive power, which fail to capture basic characteristic properties of graphs. In our work, we represent graphs as the composition of sequence representations. Through the design of sequence sampling and modeling techniques, the proposed graph representations achieve provably powerful expressiveness while maintaining permutation invariance. Empirical results show that the proposed model achieves superior results in real-world graph classification tasks. In the third part, we develop a fast implementation of spectral clustering methods on CPU-GPU platforms. Spectral clustering is one of the most popular graph clustering algorithms which achieved state-of-the art performance in a wide range of applications. However, existing implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging Big Data applications. We present a fast implementation of the spectral clustering algorithm on a CPU-GPU heterogeneous platform. Our implementation takes advantage of the computational power of the multi-core CPU and the massive multithreading capabilities of GPUs. We show that the new implementation achieved significantly accelerated computation speeds compared with previous implementations on a wide range of tasks. In the fourth part, we study structural brain networks derived from Diffusion Tensor Imaging (DTI) data. The processing of DTI data coupled with the use of modern tractographic methods reveal white matter fiber connectivity at a relatively high resolution; this allows us to model the brain as a structural network which encodes pairwise connectivity strengths between brain voxels. We have developed an iterative method to delineate the brain cortex into fine-grained connectivity-based brain parcellations. This allows to map the initial large-scale brain network into a relatively small weighted graph that preserves the essential structural connectivity information. We show that graph representations based on the brain networks from new brain parcellations are more powerful in discriminating between different populations groups, compared with existing brain parcellations.