Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2765

Browse

Search Results

Now showing 1 - 10 of 40
  • Item
    FROM PARTS TO WHOLE IN ACTION AND OBJECT UNDERSTANDING
    (2024) Devaraj, Chinmaya; Aloimonos, Yiannis; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The traditional paradigm of supervised learning in action or object recognition often relieson a top-down approach, ignoring explicit modeling of what activity or objects consist of. Recent approaches in generative AI research have shown us the ability to generate images and videos using text, indirectly indicating that we have control over the constituents of images and videos. In this dissertation, we explore ways to use the constituents of actions to develop methods to improve understanding of action. We devise different approaches to utilize the parts of actions, namely object motion, object state changes, and motion descriptions obtained by LLMs in various tasks like in the next active object segmentation, zero-shot action recognition, or video-text retrieval. We show promising benefits in action anticipation, zero-shot action recognition, and text-video retrieval tasks, demonstrating the practical applications of our methods. In the first part of the dissertation, we explore the idea of using the constituents of actions inGCNs for zero-shot human-object action recognition. The main idea is that semantically similar actions (of similar constituents) are closer in feature space. Thus, in our graph, we encode the edges connecting those actions with higher similarity. We introduce a method to visually ground the external knowledge graph using the concept of shared similarity between similar actions. We evaluate the method on the EPIC Kitchens dataset and the Charades dataset showing impressive results over baseline methods. We further show that visually grounding the knowledge graph enhances the performance of GCNs when an adversarial attack corrupts the input graph. In the second part of the thesis, we extend our ideas on human-object interactions in firstpersonvideos. Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action relies on anticipation of contact, as demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module, we apply Egocentric Object Manipulation Graphs (Ego- OMG), a framework for action anticipation and prediction. Using the Anticipation Module to aid Ego-OMG produces state-of-the-art results, achieving first and second places on the unseen and seen test sets of the EPIC Kitchens Action Anticipation Challenge and achieving state-of-the-art results on action anticipation and action prediction over EPIC Kitchens. In the same line of thinking of constituents of action, we next focus on investigatinghow motion understanding can be modeled in current video-text models. We introduce motion descriptions generated by GPT4 on three action datasets that capture fine-grained motion descriptions of activities. We evaluated several video-text models on the task of retrieval of motion descriptions and found them to need to catch up to the human expert performance. We introduce a method of improving motion understanding in video-text models by utilizing motion descriptions. This method is demonstrated on two action datasets for the motion description retrieval task. The results draw attention to the need for quality captions involving fine-grained motion information in existing datasets and demonstrate the effectiveness of the proposed pipeline in understanding fine-grained motion during video-text retrieval.
  • Item
    TOWARDS EXTENDING ACOUSTIC-TO-ARTICULATORY SPEECH INVERSION AND LEARNING ARTICULATORY REPRESENTATIONS
    (2023) H P Elapatha Rajapaksha Siriwardena, Yashish Maduwantha; Espy-Wilson, Carol; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Acoustic-to-articulatory speech inversion involves the challenging task of deducing the kinematic state of various constriction synergies, including the lips, tongue tip, tongue body, velum, and glottis, based on their respective constriction degree and location coordinates. These coordinates are referred to as vocal tract variables (TVs). Developing Speech Inversion (SI) systems have gained attention over the recent years mainly due to its potential in a wide range of speech applications like Automatic Speech Recognition (ASR), speech synthesis, speech therapy, and mental health assessments. Over the past few years, deep neural network (DNN) based models have propelled the development of SI systems to new heights. However, the current SI systems still struggle with the lack of sufficiently larger articulatory datasets, speaker dependence, poor performance with noisy speech, and the lack of generalizability across different articulatory datasets. Moreover, one of the major drawbacks of the existing articulatory datasets is the lack of ground-truth data capturing velar and glottal activity of speech. With this work, we try to address some of the aforementioned challenges pertaining to the development of effective SI systems. Our experiments are based on two publicly available articulatory datasets; the University of Wisconsin X-ray microbeam (XRMB) dataset, and the HPRC dataset. We show that the use of appropriate audio augmentation techniques to synthetically create data can further improve the performance of SI systems both on clean and noisy speech data. We also show that the use of multi-task learning frameworks to carry out an auxiliary, but a related task can also improve the TV prediction. A key improvement came about when the SI systems were forced to learn source features (aperiodicity, periodicity, and pitch) as additional targets. Moreover, the use of self-supervised speech representations (HuBERT) and fine tuning them to the downstream task of speech inversion resulted in improved performance. With the aim of extending the current SI systems to estimate velar and glottal activity, data from an ongoing data collection was used to derive and validate two parameters; nasalance to capture velar constriction degree and electroglottography (EGG) envelope to capture voicing. A separate speaker-independent SI system was subsequently trained to estimate the derived parameters and is one of the first systems to achieve the feat. This SI system along with the conventional SI systems (trained to estimate lip and tongue TVs), provide a framework to estimate a complete articulatory representation of speech in speaker-interdependent fashion. While improving and extending the current SI frameworks, we also explored an unsupervised learning algorithm inspired by sensorimotor interactions in the human brain to perform audio and speech inversion. The proposed “MirrorNet”, a constrained autoencoder architecture is first used to learn, in an unsupervised manner, the controls of an off-the-shelf audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set of parameters to approximate renditions of complex piano melodies generated by a different synthesizer. To extend the same idea of learning to vocal tract controls for speech, we developed a DNN based articulatory synthesizer (articulatory-to-acoustic forward mapping) to be incorporated as the motor plant of the MirrorNet. The MirrorNet with this motor plant, once initialized with a minimal amount of ground-truth data (~ 30 mins of speech), can learn the articulatory representations (6 TVs + source features) with significantly better accuracy. Overall, this highlights the effectiveness and power of the MirrorNet’s learning algorithm in enabling to solve the conventional acoustic-to-articulatory speech inversion problem with minimal use of ground-truth articulatory data. In order to assess the practical utility of articulatory representations in real-world scenarios, we employed articulatory coordination features derived from TVs to detect and analyze articulatory-level alterations in the speech of individuals with schizophrenia. We show that the schizophrenia subjects with strong positive symptoms (e.g. hallucinations and delusions), and who are markedly ill, pose a more complex articulatory coordination pattern in facial and speech gestures compared to healthy controls. This distinction in speech coordination pattern is used to train a multimodal convolutional neural network (CNN) which uses video and audio data to distinguish schizophrenia subjects from healthy controls. Furthermore, we used TVs estimated by the best performing SI system to detect mispronunciation of \ɹ\, a common speech sound disorder in children. The classification model trained with TVs performed better compared to the state-of-the-art hand-crafted age-and-sex normalized formants. In essence, the work in this dissertation presents steps taken towards developing effective acoustic-to-articulatory speech inversion frameworks, and highlights the importance of utilizing articulatory representations in real-world applications.
  • Item
    AGE OF INCORRECT INFORMATION: A NEW PERFORMANCE METRIC IN SEMANTIC COMMUNICATIONS
    (2023) Chen, Yutao; Ephremides, Anthony; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With the increasing popularity of smart devices and the rapid development of networking and communication technologies, cyber-physical system applications have been widely deployed and are receiving increasing attention. Some examples of these systems include vehicle networks, where vehicles collect real-time external information through their on-board sensors and cameras to generate a reliable description of the surroundings; intelligent transportation systems, where real-time monitoring of road conditions and traffic congestion is essential; and natural or man-made disaster prevention and management, where real-time monitoring of omens and disaster propagation is crucial. A common feature of these systems is the high requirement for the timeliness of the acquired information, which has led to the development of optimization frameworks aimed at capturing information freshness. Age of Information (AoI) is a prime example, but it has the drawback of only considering information freshness and ignoring the importance of content. As a result, the Age of Incorrect Information (AoII) has been developed to capture both the freshness and content of information. In this dissertation, we study the fundamental nature and optimization of AoII in numerous systems. With the proliferation of smart devices, energy consumption has become a major concern. In the first part, we focus on the characteristics and performance of AoII under limited resources. In particular, we propose an efficient algorithm to obtain the AoII-optimal policy under resource-constrained conditions and compute the performance of the optimal policies. The massive connectivity of communication systems has made scheduling a hot research topic. In the second part, we analyze and optimize the performance of AoII in the scheduling problem. We present the Whittle's index policy for AoII, whose superior performance has been recognized in many other problems. However, it also has limitations. Therefore, we propose a new scheduling policy, the indexed priority policy, which has comparable performance to the Whittle's index policy but has broader applicability. With the unprecedented increase in the amount and types of data to be transmitted and the impact of external factors such as urban construction, data transmission will experience numerous uncertainties. Therefore, in the third part, we study the characteristics and optimization of AoII in an environment with random delays. Specifically, in the first half, we consider the case where the communication channel suffers from a random delay. In the second half, we build on the first half and consider the case where the transmitter has preemption capability. For both halves, we precisely compute the performance of some canonical policies and theoretically find the optimal policies, which lay the foundation for further generalization and application of AoII.
  • Item
    Distributed Control for Formula SAE-Type Electric Vehicle
    (2022) Falco, Samantha Rose; Khaligh, Alireza; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The recent trend in transportation electrification creates an enormous increase in demand for electric vehicles (EVs). Increasingly, electric cars have novel features like autonomous driving and fault tolerance, all of which require additional hardware and computation power. Changes to the electronic control unit (ECU) structure will be needed to make these advances scalable. This thesis examines the driving economic, technical, and societal factors behind needed changes to the existing control structures. It proposes a control platform design to address issues of complexity and scalability. A generic, modular control board structure using the TMS320F2837xS digital signal processor (DSP) is described with several input/output functionalities including a wide range of analog inputs, multiple logic levels for digital pins, CAN communication, and wireless communication capabilities. A distributed control network is built by interconnecting multiple implementations of the control board, each of which has distinct responsibilities dictated by software instead of hardware. A prototype electric vehicle control structure for a Formula SAE electric vehicle was built utilizing a network of three control boards and tested to prove the viability of the proposed concept. Results of these tests and future steps for the project are discussed.
  • Item
    MEASURING AND TRAPPING QUASIPARTICLES IN SUPERCONDUCTING COPLANAR WAVEGUIDE RESONATORS
    (2021) Alexander, Ashish; Goldhar, Julius; Richardson, Christopher; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Measuring the internal quality factor of coplanar waveguide superconducting resonators is an established method of determining small losses in superconducting devices. Traditionally, the resonator losses are only attributed to two-level system (TLS) defects using a power dependent model for the quality factor. However, excess non-equilibrium quasiparticles can also limit the quality factor of the planar superconducting resonators used in circuit quantum electrodynamics. At millikelvin temperatures, quasiparticles can be generated by breaking Cooper pairs via high-energy particles or sub-gap microwave photons from the measurement signal.In this thesis, I developed a two-temperature, power, and temperature dependent model to evaluate resonator losses for isolating TLS and quasiparticle loss simultaneously. The model combines a standard TLS model with a new modified two-temperature quasiparticle model where the driven quasiparticle density is defined by an effective temperature that may be different than the bath temperature. This model also explores the power and temperature dependence of the internal quality factor. To investigate the model, resonators were fabricated from epitaxial molecular beam epitaxy-grown aluminum and titanium nitride grown on float-zone refined silicon. The resonators have high-quality factors above 1M. The presented model is used to determine that the analyzed TiN resonator had comparable TLS and quasiparticle loss at low power and low temperature, while the low-temperature Al resonator behavior was dominated by non-equilibrium quasiparticle loss. Additionally, a small bandgap superconductor in contact with a larger bandgap superconductor as a quasiparticle trap is also explored. The quasiparticles can be confined away from the larger bandgap superconductor into the smaller one. Here, Al and TiN were used as two superconductors. Finite difference method (FDM) simulations of the coupled phonon and quasiparticle systems of both superconductors are performed, suggesting that the quasiparticle traps on the ground plane may be effective for setback distances less than 200 μm away from TiN waveguide features. Experimentally, a thin layer of Al is grown in-situ on TiN using molecular beam epitaxy (MBE) with a negligible dielectric layer between the two superconductors to increase the trapping efficiency of the Al. The quarter-wavelength resonators in TiN with an Al layer with varying setback distances (1 µm – 150 µm) from the active region of the TiN were also fabricated using custom-designed mask sets. These devices are then analyzed for different powers and temperatures. The resonators with setback greater than 20 µm outperform the plain TiN resonators at low temperatures. The device with a setback of 150 µm had 1.5x the quality factor at medium powers at low temperatures.
  • Item
    Measuring and Mitigating Potential Risks of Third-party Resource Inclusions
    (2021) Indela, Soumya; Levin, Dave; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In today's computer services, developers commonly use third-party resources like libraries, hosting infrastructure and advertisements. Using third-party components improves the efficiency and enhances the quality of developing custom applications. However, while using third-party resources adopts their benefits, it adopts their vulnerabilities, as well. Unfortunately, developers are uninformed about the risks, as a result of which, the services are susceptible to various attacks. There has been a lot of work on how to develop first-hand secure services. The key focus in my thesis is quantifying the risks in the inclusion of third-party resources and looking into possible ways of mitigating them. Based on the fundamental ways that risks arise, we broadly classify them into Direct and Indirect Risks. Direct risk is the risk that comes with invoking the third-party resource incorrectly—even if the third party is otherwise trustworthy whereas indirect risk is the risk that comes with the third-party resource potentially acting in an untrustworthy manner—even if it were invoked correctly. To understand the security related direct risks in third-party inclusions, we study cryptographic frameworks. Developers often use these frameworks incorrectly and introduce security vulnerabilities. This is because current cryptographic frameworks erode abstraction boundaries, as they do not encapsulate all the framework-specific knowledge and expect developers to understand security attacks and defenses. Starting from the documented misuse cases of cryptographic APIs, we infer five developer needs and we show that a good API design would address these needs only partially. Building on this observation, we propose APIs that are semantically meaningful for developers. We show how these interfaces can be implemented consistently on top of existing frameworks using novel and known design patterns, and we propose build management hooks for isolating security workarounds needed during the development and test phases. To understand the performance related direct risks in third-party inclusions, we study resource hints in webpage HTML. Today's websites involve loading a large number of resources, resulting in a considerable amount of time issuing DNS requests, requesting resources, and waiting for responses. As an optimization for these time sinks, websites may load resource hints, such as DNS prefetch, preconnect, preload, pre-render, and prefetch tags in their HTML files to cause clients to initiate DNS queries and resource fetches early in their web-page downloads before encountering the precise resource to download. We explore whether websites are making effective use of resource hints using techniques based on the tool we developed to obtain a complete snapshot of a webpage at a given point in time. We find that many popular websites are highly ineffective in their use of resource hints, causing clients to query and connect to extraneous domains, download unnecessary data, and may even use resource hints to bypass ad blockers. To evaluate the indirect risks, we study the web topology. Users who visit benign, popular websites are unfortunately bombarded with malicious popups, malware- loading sites, and phishing sites. The questions we want to address here are: Which domains are responsible for such malicious activity? At what point in the process of loading a popular, trusted website does the trust break down to loading dangerous content? To answer these questions, we first understand what third-party resources websites load (both directly and indirectly). I present a tool that constructs the most complete map of a website’s resource-level topology to date. This is surprisingly nontrivial; most prior work used only a single run of a single tool (e.g., Puppeteer or Selenium), but I show that this misses a significant fraction of resources. I then apply my tool to collect the resource topology graphs of 20,000 websites from the Alexa ranking, and analyze them to understand which third-party resource inclusions lead to malicious resources. I believe that these third-party inclusions are not always constant or blocked by existing Ad-blockers. We argue that greater accountability of these third parties can lead to a safer web.
  • Item
    A GALLIUM NITRIDE INTEGRATED ONBOARD CHARGER
    (2020) Zou, Shenli; Khaligh, Alireza; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Compared to Silicon metal–oxide–semiconductor field-effect transistors (MOSFETs), Gallium Nitride (GaN) devices have a significant reduction in gate charge, output capacitance, and zero reverse recovery charge, enabling higher switching frequency operation and efficient power conversion. GaN devices are gaining momentum in power electronic systems such as electric vehicle (EV) charging system, due to their promises to significantly enhance the power density and efficiency. In this dissertation, a GaN-based integrated onboard charger (OBC) and auxiliary power module (APM) is proposed for EVs to ensure high efficiency, high frequency, high power density, and capability of bidirectional operation. The high switching frequency operation enabled by the GaN devices and the integration of OBC and APM bring many unique challenges, which are addressed in this dissertation. An important challenge is the optimal design of high-frequency magnetics for a high-frequency GaN-based power electronic interface. Another challenge is to achieve power flow management among three active ports while minimizing the circulating power. Furthermore, the impact of circuit layout parasitics could significantly deteriorate the system interface, due to the sensitivity of GaN device switching characteristics. In this work, the aforementioned challenges have been addressed. First, a comprehensive analysis of the front-end AC-DC power factor correction stage is presented, covering a detailed magnetic modeling technique to address the high-frequency magnetics challenge. Second, the modeling and control of a three-port DC-DC converter, interfacing the AC-DC stage, high-voltage traction battery and low-voltage battery, are discussed to address the power flow challenge. Advanced control methodologies are developed to realize power flow management while maintaining minimum circulating power and soft switching. Furthermore, a new three-winding high-frequency transformer design with improved power density and efficiency is achieved using a genetic-algorithm-based optimization approach. Finally, a GaN-based integrated charger prototype is developed to validate the proposed theoretical hypothesis. The experimental results showed that the GaN-based charging system has the capability of achieving simultaneous charging (G2B) of both HV and LV batteries with a peak efficiency of 95%.
  • Item
    DEEP NEURAL NETWORK FOR DEREVERBERATION
    (2019) Jiao, Yang; Duraiswami, Ramani; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recently, deep neural networks have achieved incredible success in the area of computer vision and natural language processing. The research topics under the umbrella of speech enhancement have embraced this chance to revolutionize. Dereverberation as one such topics has gained less popularity compared to other tasks of speech enhancement such as cocktail party problem, speech enhancement in open area. Our aim is to extend method of deep learning into the domain of dereverberation. We leverage a successful neural network method on a similar task of speech enhancement to dereverberation, specifically, time frequency mask supported GEV beamformer. This data driven approach introduces feature transferrable to related tasks compared to hand-engineered methods. Our experiments illustrate that the original framework arise from open area enhancement tasks is proved to be effective in our closure space tasks.
  • Item
    HIGH EFFICIENCY CIS SOLAR CELLS BY A SIMPLE TWO-STEP SELENIZATION PROCESS AND WAVEGUIDE BRAGG GRATINGS IN INTEGRATED PHOTONICS
    (2019) Zhang, Yang; Dagenais, Mario; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Part I: High Efficiency CIS Solar Cells with Simple Fabrication Method CIS has a very high optical absorption coefficient, which makes it able to absorb more than 90% of the incident photons with energies higher than 1.04 eV within 1-2 µm thickness. Because of the high absorption coefficient and low bandgap, high quality CIS solar cells can have a very high short circuit current compared with other thin film material or other type of solar cells. We offer a very simple two-step process based on annealing stacked elemental layers under selenium vapor within a graphite box, followed by a potassium fluoride postdeposition treatment, which is a low-cost and highly manufacturable approach. We are able to reproducibly achieve above 12% conversion efficiency, with the champion cell exhibiting near-record 14.7% efficiency. Our results indicate that perhaps the CIS system is less sensitive to elaborate processing steps and details than previously thought. This simple approach offers a very useful experimental platform from which to study a variety of thin film PV research topics, including the possibility of producing tandem solar cell by also using perovskite. Part II: Waveguides Bragg Gratings in Integrated Photonics Integrated photonics on silicon-based material combines two great inventions of the last century: silicon technology and photonic technology. It is paving the way for a monolithically integrated optoelectronic platform on a single chip. Being a prevailing research topic in the past decade, it has seen tremendous progress with the successful development of high-performance components. Among all integrated photonics platforms, the silicon nitride planar waveguide platform provides benefits like low optical losses, transparency over a wide wavelength range (400-2350 nm), compatibility with CMOS and wafer-scale foundry processes, and high-power handling capabilities. In this part, waveguides Bragg gratings are investigated to improve the performance of several integrated photonics components. An 83-dB rejection ratio pump filter using a periodic waveguide Bragg grating with an efficient z-shape waveguide design to suppress the TM mode and avoid scattered modes is demonstrated. Fabry-Perot cavity enhanced four-wave mixing devices are optimized based on a numerical model developed with an ABCD matrix method and four-wave mixing in a Fabry-Perot cavity that uses grating is demonstrated experimentally. Finally, to reduce the pixel size and power consumption of optical phased array for virtual reality applications, complex waveguide Bragg gratings are generated via both Layer Peeling/Adding algorithm and genetic algorithm to support slow-light modes over certain bandwidth.
  • Item
    An Integrated Single-phase On-board Charger
    (2019) Lu, Jiangheng; Khaligh, Alireza; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With the growing demand for transportation electrification, plug-in electric vehicles (PEVs), and plug-in hybrid electric vehicles (PHEVs), cumulatively called electric vehicles (EVs) are drawing more and more attention. The on-board charger (OBC), which is the power electronics interface between the power grid and the high voltage traction battery, is an important part for charging EVs. Besides the OBC, every EV is equipped with another separate power unit called the auxiliary power module (APM) to charge the low voltage (LV) auxiliary battery, which supplies all the electronics on car including audio, air conditioner, lights and controllers. The main target of this work is a novel way to integrate both units together to achieve a charger design that is not only capable of bi-directional operation with high efficiency, but also higher gravimetric and volumetric power density, as compared with those of the existing OBCs and APMs combined. To achieve this target, following contributions are made: (i) a three-port integrated DC/DC converter, which combines OBC and APM together through an innovative integration method; (ii) an innovative zero-crossing current spike compensation for interleaved totem pole power factor correction (PFC) and (iii) a new phase-shift based control strategy to achieve a regulated power flow management with minimum circulating losses.