Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2765

Browse

Search Results

Now showing 1 - 10 of 16
  • Item
    Studies in Differential Privacy and Federated Learning
    (2024) Zawacki, Christopher Cameron; Abed, Eyad H; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In the late 20th century, Machine Learning underwent a paradigm shift from model-driven to data-driven design. Rather than field specific models, advances in sensors, data storage, and computing power enabled the collection of increasing amounts of data. The abundance of new data allowed researchers to fit flexible models directly to observed data. The influx of information made possible numerous advances, including the development of novel medicines, increases in efficiency of markets, and the proliferation of vast sensor networks. However, not all data should be freely accessible. Sensitive medical records, personal finances, and private IDs are all currently stored on digital devices across the world with the expectation that they remain private. However, at the same time, such data is frequently instrumental in the development of predictive models. Since the beginning of the 21st century, researchers have recognized that traditional methods of anonymizing data are inadequate for protecting client identities. This dissertation's primary focus is the advancement of two fields of data privacy: Differential Privacy and Federated Learning. Differential Privacy is one of the most successful modern privacy methods. By injecting carefully structured noise into a dataset, Differential Privacy obscures individual contributions while allowing researchers to extract meaningful information from the aggregate. Within this methodology, the Gaussian mechanism is one of the most common privacy mechanisms due to its favorable properties such as the ability of each client to apply noise locally before transmission to a server. However, the use of this mechanism yields only an approximate form of Differential Privacy. This dissertation introduces the first in-depth analysis of the Symmetric alpha-Stable (SaS) privacy mechanism, demonstrating its ability to achieve pure-Differential Privacy while retaining local applicability. Based on these findings, the dissertation advocates for using the SaS privacy mechanism in protecting the privacy of client data. Federated Learning is a sub-field of Machine Learning, which trains Machine Learning models across a collection (federation) of client devices. This approach aims to protect client privacy by limiting the type of information that clients transmit to the server. However, this distributed environment poses challenges such as non-uniform data distributions and inconsistent client update rates, which reduces the accuracy of trained models. To overcome these challenges, we introduce Federated Inference, a novel algorithm that we show is consistent in federated environments. That is, even when the data is unevenly distributed and the clients' responses to the server are staggered in time (asynchronous), the algorithm is able to converge to the global optimum. We also present a novel result in system identification in which we extend a method known as Dynamic Mode Decomposition to accommodate input delayed systems. This advancement enhances the accuracy of identifying and controlling systems relevant to privacy-sensitive applications such as smart grids and autonomous vehicles. Privacy is increasingly pertinent, especially as investments in computer infrastructure constantly grow in order to cater to larger client bases. Privacy failures impact an ever-growing number of individuals. This dissertation reports on our efforts to advance the toolkit of data privacy tools through novel methods and analysis while navigating the challenges of the field.
  • Item
    Cardiovascular Physiological Monitoring Based on Video
    (2023) Gebeyehu, Henok; Wu, Min; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Regular, continuous monitoring of the heart is advantageous to maintaining one’s cardiovascular health as it enables the early detection of potentially life-threatening cardiovascular diseases. Typically, the required devices for continuous monitoring are found in a clinical setting, but recent research developments have advanced remote physiological monitoring capabilities and expanded the options for continuous monitoring from home. This thesis focuses on further extending the monitoring capabilities of consumer electronic devices to motivate the feasibility of reconstructing Electrocardiograms via a smartphone camera. First, the relationship between skin tone and remote physiological sensing is examined as variations in melanin concentrations for people of diverse skin tones can affect remote physiological sensing. In this work, a study is performed to observe the prospect of reducing the performance disparity caused by melanin differences by exploring the sites from which the physiological signal is collected. Second, the physiological signals obtained from the previous part are enhanced to improve the signal-to-noise ratio and utilized to infer ECG as parts of a novel technique that emphasizes interpretability as a guiding principle. The findings in this work have the potential to enable and promote the remote sensing of a physiological signal that is more informative than what is currently possible with remote sensing.
  • Item
    Control Theory-Inspired Acceleration of the Gradient-Descent Method: Centralized and Distributed
    (2022) Chakrabarti, Kushal; Chopra, Nikhil; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Mathematical optimization problems are prevalent across various disciplines in science and engineering. Particularly in electrical engineering, convex and non-convex optimization problems are well-known in signal processing, estimation, control, and machine learning research. In many of these contemporary applications, the data points are dispersed over several sources. Restrictions such as industrial competition, administrative regulations, and user privacy have motivated significant research on distributed optimization algorithms for solving such data-driven modeling problems. The traditional gradient-descent method can solve optimization problems with differentiable cost functions. However, the speed of convergence of the gradient-descent method and its accelerated variants is highly influenced by the conditioning of the optimization problem being solved. Specifically, when the cost is ill-conditioned, these methods (i) require many iterations to converge and (ii) are highly unstable against process noise. In this dissertation, we propose novel optimization algorithms, inspired by control-theoretic tools, that can significantly attenuate the influence of the problem's conditioning. First, we consider solving the linear regression problem in a distributed server-agent network. We propose the Iteratively Pre-conditioned Gradient-Descent (IPG) algorithm to mitigate the deleterious impact of the data points' conditioning on the convergence rate. We show that the IPG algorithm has an improved rate of convergence in comparison to both the classical and the accelerated gradient-descent methods. We further study the robustness of IPG against system noise and extend the idea of iterative pre-conditioning to stochastic settings, where the server updates the estimate based on a randomly selected data point at every iteration. In the same distributed environment, we present theoretical results on the local convergence of IPG for solving convex optimization problems. Next, we consider solving a system of linear equations in peer-to-peer multi-agent networks and propose a decentralized pre-conditioning technique. The proposed algorithm converges linearly, with an improved convergence rate than the decentralized gradient-descent. Considering the practical scenario where the computations performed by the agents are corrupted, or a communication delay exists between them, we study the robustness guarantee of the proposed algorithm and a variant of it. We apply the proposed algorithm for solving decentralized state estimation problems. Further, we develop a generic framework for adaptive gradient methods that solve non-convex optimization problems. Here, we model the adaptive gradient methods in a state-space framework, which allows us to exploit control-theoretic methodology in analyzing Adam and its prominent variants. We then utilize the classical transfer function paradigm to propose new variants of a few existing adaptive gradient methods. Applications on benchmark machine learning tasks demonstrate our proposed algorithms' efficiency. Our findings suggest further exploration of the existing tools from control theory in complex machine learning problems. The dissertation is concluded by showing that the potential in the previously mentioned idea of IPG goes beyond solving generic optimization problems through the development of a novel distributed beamforming algorithm and a novel observer for nonlinear dynamical systems, where IPG's robustness serves as a foundation in our designs. The proposed IPG for distributed beamforming (IPG-DB) facilitates a rapid establishment of communication links with far-field targets while jamming potential adversaries without assuming any feedback from the receivers, subject to unknown multipath fading in realistic environments. The proposed IPG observer utilizes a non-symmetric pre-conditioner, like IPG, as an approximation of the observability mapping's inverse Jacobian such that it asymptotically replicates the Newton observer with an additional advantage of enhanced robustness against measurement noise. Empirical results are presented, demonstrating both of these methods' efficiency compared to the existing methodologies.
  • Item
    Intellectual Property Protection: From Integrated Circuits to Machine Learning Models
    (2022) Aramoon, Omid; Qu, Gang; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The increasing popularity of intellectual property (IP) based design in the semiconductor and artificial intelligence (AI) industry has created a growing market for silicon and machine learning (ML) IPs. The emerging IP market in both sectors has facilitated the exchange of designs and ideas among entities, which in turn has helped speed up innovations, lower R&D costs, and shorten the time-to-market for new products. Nonetheless, two major concerns have been raised in the IP market that may overshadow these benefits and, consequently, discourage suppliers (IP vendors) and consumers (IP buyers) from entering the IP market. First, there is the issue of IP infringements, which negatively impact IP vendors. Given that IPs can easily be copied and distributed, sharing them with other entities in a market environment increases the risk of IP theft and copyright violations. Such infringements would erode the profit margins of IP vendors and discourage them from investing in further IP development. The second issue pertains to IP buyers, who are primarily concerned about how using third-party IPs might impact the safety and security (S&S) of their systems. Many real-world applications require designers to provide S&S assurance for their products. However, this becomes challenging for systems that make use of third-party IPs since IP buyers often lack the necessary knowledge about the core design features of commercial IPs to devise effective S&S measures. In this thesis, our goal is to develop technical solutions to address these two concerns in order to promote participation in the semiconductor and AI IP markets and thereby stimulate faster growth in both sectors. The first part of this thesis is dedicated to addressing vendors' concerns regarding IP infringements by proposing IP watermarking and IP fingerprinting solutions. Protecting IPs through legal means is passive and ineffective unless forensic means such as IP watermarking and IP fingerprinting are available to assist vendors in establishing ownership over pirated IPs and identifying the source of infringement. In this direction, we make four contributions: (1) Our first contribution is a dynamic watermarking scheme for silicon IPs that relies on the multi-functionality of polymorphic gates to hide ownership information in circuits. With the proposed watermarking method, the circuit functions as expected at normal operating temperature; however, when the circuit is heated, the hidden behavior of polymorphic gates is activated and the circuit's functionality changes to reveal the watermark. Experiment results demonstrate that our scheme can embed large multi-bit signatures while incurring low overhead in terms of performance, area, and power consumption. (2) The second contribution is a black-box watermarking method for ML IPs, particularly deep neural network (DNN) classifiers, which we call GradSigns. The proposed scheme embeds the ownership information as a set of stego-constraints on the gradients of model components. Our experiments suggest that GradSigns is extremely robust to counter-watermark attacks and is capable of embedding large multi-bit signatures without sacrificing the performance of the model, two properties that were lacking in the prior art. (3) The third contribution is a fingerprinting scheme for silicon IPs that replaces standard cells holding “Satisfiability Don’t Care” (SDC) conditions with signal-controlled polymorphic gates. With the proposed approach, each copy of the IP and its corresponding buyer can be identified based on the configuration of the polymorphic gates, i.e. the IP fingerprint. This attribute can help vendors trace the source of IP piracy if needed. Experiments demonstrate that our method can provide sufficiently strong fingerprints with about half the overhead of similar methods. (4) The fourth and final contribution in this direction is a fingerprinting technique where the standard testing infrastructure in system-on-chips (SoCs) design is repurposed to create unique fingerprints. To this end, we adopt the reconfigurable scan network (RSN) in SoCs and develop a fingerprinting protocol that configures a unique RSN for each sold copy by utilizing different connection styles between scan cells. Experiments show that the proposed method is capable of creating a large number of distinct fingerprints while incurring little overhead. The second part of this thesis is dedicated to addressing IP buyers’ concerns regarding the security and safety risks of using third-party IPs, with an emphasis on ML IPs. Commercial models are primarily marketed as black box oracles to reduce the risk of IP infringements. However, having little knowledge about the design details of commercial models can complicate IP buyers’ efforts in addressing various S&S threats that may arise in real-world applications of ML. In this thesis, we specifically discuss two of such concerns, namely (a) inaccuracy and overconfidence of DNN classifiers in the presence of anomalous inputs, and (b) the threat from model tampering (or model integrity) attacks, and explain why existing countermeasures aren't applicable to black-box commercial DNNs. The following two contributions are made to address this shortcoming: (1) Our first contribution is a tamper detection technique, called AID (Attesting the Integrity of DNNs). The proposed method generates a set of input-output test cases that can reveal whether a model has been tampered with. AID does not require access to parameters of models and thus is compatible with black-box commercial DNNs. Experimental results show that AID is highly effective and reliable, in that, with at most four test cases, AID is able to detect eight representative integrity attacks with zero false-positive. (2) The second contribution in this direction is PAD-Lock, a Power side-channel-based Anomaly Detection framework for black-box DNN classifiers. The proposed method uses the power side-channel information during DNN inference operation as a proxy for the model's inner computation and discovers patterns that can be used to detect anomalous inputs such as adversarial and out-of-distribution samples based on this information. Upon preliminary examination, PAD-Lock appears to be a practical and effective framework for detecting anomalies in black-box commercial DNNs. In summary, the methods presented in this dissertation fortify the protection of semiconductor and ML IPs against IP infringement activities and assist IP buyers in ensuring the safety and security of systems containing commercial IPs. We believe these technical solutions constitute a major step toward addressing concerns raised in the semiconductor and AI IP markets, and will ultimately encourage more entities to participate in both markets.
  • Item
    Advances in Quantitative Characterizations of Electrophysiological Neural Activity
    (2020) Nahmias, David; Kontson, Kimberly L; Simon, Jonathan Z; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Disorders of the brain and nervous system result in more hospitalizations and lost productivity than any other disease group. Electroencephalography (EEG), which measures brain electrical signals from the scalp, is a common neuro-monitoring technique used for diagnostic, rehabilitative, and therapeutic purposes. Understanding EEG quantitatively and its neural correlates with patient characteristics could inform the safety and efficacy of technologies that rely on EEG. In this dissertation, a large clinical data set comprised of over 35,000 recordings as well as data from previous research experiments are utilized to better quantify characteristics of neurological activity. We first propose non-parametric methods of evaluating consistency of quantitative EEG features (qEEG) by applying novel statistical approaches. These results provide data-driven methods of identifying qEEG and their spatial characteristics ideal for various applications, and determining consistencies of novel features using existing data. These qEEG are commonly used in feature-based machine learning applications. Further, EEG-driven deep learning has shown promising results in distinguishing recordings of subjects. To better understand the performance of these two machine learning approaches, we assess their ability to distinguish between subjects taking different anticonvulsants. Our methods could successfully discriminate between patients taking either anticonvulsant and those taking no medications solely from neural activity with similar performance from both feature-based and deep learning approaches. With feature-based methods, it is easier to interpret which qEEG have the most impact on algorithm performance. However, deep learning applications in EEG can present difficulty in understanding and investigating underlying neurophysiological implications. We propose and validate a method to investigate frequency band importance in EEG-driven deep learning models. The easy perturbation EEG algorithm for spectral importance (easyPEASI) is simpler than previous methods and is applied to classifications investigated in this work. Until this point, our work used well segmented EEG from clinical settings. However, EEG is usually corrupted by noise which can degrade its utility. We formulate and validate novel approaches to score electrophysiological signal quality based on the presence of noise from various sources. Further, we apply our method to compare and evaluate the performance of existing artifact removal algorithms.
  • Item
    TOWARDS BUILDING GENERALIZABLE SPEECH EMOTION RECOGNITION MODELS
    (2019) Sahu, Saurabh; Espy-Wilson, Carol; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Abstract: Detecting the mental state of a person has implications in psychiatry, medicine, psychology and human-computer interaction systems among others. It includes (but is not limited to) a wide variety of problems such as emotion detection, valence-affect-dominance states prediction, mood detection and detection of clinical depression. In this thesis we focus primarily on emotion recognition. Like any recognition system, building an emotion recognition model consists of the following two steps: 1. Extraction of meaningful features that would help in classification 2. Development of an appropriate classifier Speech data being non-invasive and the ease with which it can be collected has made it a popular candidate for feature extraction. However, an ideal system designed should be agnostic to speaker and channel effects. While feature normalization schemes can counter these problems to some extent, we still see a drastic drop in performance when the training and test data-sets are unmatched. In this dissertation we explore some novel ways towards building models that are more robust to speaker and domain differences. Training discriminative classifiers involves learning a conditional distribution p(y_i|x_i), given a set of feature vectors x_i and the corresponding labels y_i; i=1,...N. For a classifier to be generalizable and not overfit to training data, the resulting conditional distribution p(y_i|x_i) is desired to be smoothly varying over the inputs x_i. Adversarial training procedures enforce this smoothness using manifold regularization techniques. Manifold regularization makes the model’s output distribution more robust to local perturbation added to a datapoint x_i. In the first part of the dissertation, we investigate two training procedures: (i) adversarial training where we determine the perturbation direction based on the given labels for the training data and, (ii) virtual adversarial training where we determine the perturbation direction based only on the output distribution of the training data. We demonstrate the efficacy of adversarial training procedures by performing a k-fold cross validation experiment on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and a cross-corpus performance analysis on three separate corpora. We compare their performances to that of a model utilizing other regularization schemes such as L1/L2 and graph based manifold regularization scheme. Results show improvement over a purely supervised approach, as well as better generalization capability to cross-corpus settings. Our second approach to better discriminate between emotions leverages multi-modal learning and automated speech recognition (ASR) systems toward improving the generalizability of an emotion recognition model that requires only speech as input. Previous studies have shown that emotion recognition models using only acoustic features do not perform satisfactorily in detecting valence level. Text analysis has been shown to be helpful for sentiment classification. We compared classification accuracies obtained from an audio-only model, a text-only model and a multi-modal system leveraging both by performing a cross-validation analysis on IEMOCAP dataset. Confusion matrices show it’s the valence level detection that is being improved by incorporating textual information. In the second stage of experiments, we used three ASR application programming interfaces (APIs) to get the transcriptions. We compare the performances of multi-modal systems using the ASR transcriptions with each other and with that of one using ground truth transcription. This is followed by a cross-corpus study. In the third part of the study we investigate the generalizability of generative of generative adversarial networks (GANs) based models. GANs have gained a lot of attention from machine learning community due to their ability to learn and mimic an input data distribution. GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn a target underlying data distribution; when fed with data-points sampled from a simpler distribution (like uniform or Gaussian distribution). Once trained, they allow synthetic generation of examples sampled from the target distribution. We investigate the applicability of GANs to get lower dimensional representations from the higher dimensional feature vectors pertinent for emotion recognition. We also investigate their ability to generate synthetic higher dimensional feature vectors using points sampled from a lower dimensional prior. Specifically, we investigate two set ups: (i) when the lower dimensional prior from which synthetic feature vectors are generated is pre-defined, (ii) when the distribution of lower dimensional prior is learned from training data. We define the metrics that we used to measure and analyze the performance of these generative models in different train/test conditions. We perform cross validation analysis followed by a cross-corpus study. Finally we make an attempt towards understanding the relation between two different sub-problems encompassed under mental state detection namely depression detection and emotion recognition. We propose approaches that can be investigated to build better depression detection models by leveraging our ability to recognize emotions accurately.
  • Item
    Improving Existing Static and Dynamic Malware Detection Techniques with Instruction-level Behavior
    (2019) Kim, Danny; Barua, Rajeev; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    My Ph.D. focuses on detecting malware by leveraging the information obtained at an instruction-level. Instruction-level information is obtained by looking at the instructions or disassembly that make up an executable. My initial work focused on using a dynamic binary instrumentation (DBI) tool. A DBI tool enables the study of instruction-level behavior while the malware is executing, which I show proves to be valuable in detecting malware. To expand on my work with dynamic instruction-level information, I integrated it with machine learning to increase the scalability and robustness of my detection tool. To further increase the scalability of the dynamic detection of malware, I created a two stage static-dynamic malware detection scheme aimed at achieving the accuracy of a fully-dynamic detection scheme without the high computational resources and time required. Lastly, I show the improvement of static analysis-based detection of malware by automatically generated machine learning features based on opcode sequences with the help of convolutional neural networks. The first part of my research focused on obfuscated malware. Obfuscation is the process in which malware tries to hide itself from static analysis and trick disassemblers. I found that by using a DBI tool, I was able to not only detect obfuscation, but detect the differences in how it occurred in malware versus goodware. Through dynamic program-level analysis, I was able to detect specific obfuscations and use the varying methods in which it was used by programs to differentiate malware and goodware. I found that by using the mere presence of obfuscation as a method of detecting malware, I was able to detect previously undetected malware. I then focused on using my knowledge of dynamic program-level features to build a highly accurate machine learning-based malware detection tool. Machine learning is useful in malware detection because it can process a large amount of data to determine meaningful relationships to distinguish malware from benign programs. Through the integration of machine learning, I was able to expand my obfuscation detection schemes to address a broader class of malware, which ultimately led to a malware detection tool that can detect 98.45% of malware with a 1% false positive rate. Understanding the pitfalls of dynamic analysis of malware, I focused on creating a more efficient method of detecting malware. Malware detection comes in three methods: static analysis, dynamic analysis, and hybrids. Static analysis is fast and effective for detecting previously seen malware where as dynamic analysis can be more accurate and robust against zero-day or polymorphic malware, but at the cost of a high computational load. Most modern defenses today use a hybrid approach, which uses both static and dynamic analysis, but are suboptimal. I created a two-phase malware detection tool that approaches the accuracy of the dynamic-only system with only a small fraction of its computational cost, while maintaining a real-time malware detection timeliness similar to a static-only system, thus achieving the best of both approaches. Lastly, my Ph.D. focused on reducing the need for manual feature generation by utilizing Convolutional Neural Networks (CNNs) to automatically generate feature vectors from raw input data. My work shows that using a raw sequence of opcode sequences from static disassembly with a CNN model can automatically produce feature vectors that are useful for detecting malware. Because this process is automated, it presents as a scalable method of consistently producing useful features without human intervention or labor that can be used to detect malware.
  • Item
    Towards a Fast and Accurate Face Recognition System from Deep Representations
    (2019) Ranjan, Rajeev; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The key components of a machine perception algorithm are feature extraction followed by classification or regression. The features representing the input data should have the following desirable properties: 1) they should contain the discriminative information required for accurate classification, 2) they should be robust and adaptive to several variations in the input data due to illumination, translation/rotation, resolution, and input noise, 3) they should lie on a simple manifold for easy classification or regression. Over the years, researchers have come up with various hand crafted techniques to extract meaningful features. However, these features do not perform well for data collected in unconstrained settings due to large variations in appearance and other nuisance factors. Recent developments in deep convolutional neural networks (DCNNs) have shown impressive performance improvements in various machine perception tasks such as object detection and recognition. DCNNs are highly non-linear regressors because of the presence of hierarchical convolutional layers with non-linear activation. Unlike the hand crafted features, DCNNs learn the feature extraction and feature classification/regression modules from the data itself in an end-to-end fashion. This enables the DCNNs to be robust to variations present in the data and at the same time improve their discriminative ability. Ever-increasing computation power and availability of large datasets have led to significant performance gains from DCNNs. However, these developments in deep learning are not directly applicable to the face analysis tasks due to large variations in illumination, resolution, viewpoint, and attributes of faces acquired in unconstrained settings. In this dissertation, we address this issue by developing efficient DCNN architectures and loss functions for multiple face analysis tasks such as face detection, pose estimation, landmarks localization, and face recognition from unconstrained images and videos. In the first part of this dissertation, we present two face detection algorithms based on deep pyramidal features. The first face detector, called DP2MFD, utilizes the concepts of deformable parts model (DPM) in the context of deep learning. It is able to detect faces of various sizes and poses in unconstrained conditions. It reduces the gap in training and testing of DPM on deep features by adding a normalization layer to the DCNN. The second face detector, called Deep Pyramid Single Shot Face Detector (DPSSD), is fast and capable of detecting faces with large scale variations (especially tiny faces). It makes use of the inbuilt pyramidal hierarchy present in a DCNN, instead of creating an image pyramid. Extensive experiments on publicly available unconstrained face detection datasets show that both these face detectors are able to capture the meaningful structure of faces and perform significantly better than many traditional face detection algorithms. In the second part of this dissertation, we present two algorithms for simultaneous face detection, landmarks localization, pose estimation and gender recognition using DCNNs. The first method called, HyperFace, fuses the intermediate layers of a DCNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. The second approach extends HyperFace to incorporate additional tasks of face verification, age estimation, and smile detection, in All-In-One Face. HyperFace and All-In-One Face exploit the synergy among the tasks which improves individual performances. In the third part of this dissertation, we focus on improving the task of face verification by designing a novel loss function that maximizes the inter-class distance and minimizes the intraclass distance in the feature space. We propose a new loss function, called Crystal Loss, that adds an L2-constraint to the feature descriptors which restricts them to lie on a hypersphere of a fixed radius. This module can be easily implemented using existing deep learning frameworks. We show that integrating this simple step in the training pipeline significantly boosts the performance of face verification. We additionally describe a deep learning pipeline for unconstrained face identification and verification which achieves state-of-the-art performance on several benchmark datasets. We provide the design details of the various modules involved in automatic face recognition: face detection, landmark localization and alignment, and face identification/verification. We present experimental results for end-to-end face verification and identification on IARPA Janus Benchmarks A, B and C (IJB-A, IJB-B, IJB-C), and the Janus Challenge Set 5 (CS5). Though DCNNs have surpassed human-level performance on tasks such as object classification and face verification, they can easily be fooled by adversarial attacks. These attacks add a small perturbation to the input image that causes the network to mis-classify the sample. In the final part of this dissertation, we focus on safeguarding the DCNNs and neutralizing adversarial attacks by compact feature learning. In particular, we show that learning features in a closed and bounded space improves the robustness of the network. We explore the effect of Crystal Loss, that enforces compactness in the learned features, thus resulting in enhanced robustness to adversarial perturbations. Additionally, we propose compact convolution, a novel method of convolution that when incorporated in conventional CNNs improves their robustness. Compact convolution ensures feature compactness at every layer such that they are bounded and close to each other. Extensive experiments show that Compact Convolutional Networks (CCNs) neutralize multiple types of attacks, and perform better than existing methods in defending adversarial attacks, without incurring any additional training overhead compared to CNNs.
  • Item
    Towards robust and domain invariant feature representations in Deep Learning
    (2018) Sankaranarayanan, Swaminathan; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    A fundamental problem in perception-based systems is to define and learn representations of the scene that are more robust and adaptive to several nuisance factors. Over the recent past, for a variety of tasks involving images, learned representations have been empirically shown to outperform handcrafted ones. However, their inability to generalize across varying data distributions poses the following question: Do representations learned using deep networks just fit a given data distribution or do they sufficiently model the underlying structure of the problem ? This question could be understood using a simple example: If a learning algorithm is shown a number of images of a simple handwritten digit, then the representation learned should be generic enough to identify the same digit in a different form. With regards to deep networks, although the learned representation has been shown to be robust to various forms of synthetic distortions such as random noise, they fail in the presence of more implicit forms of naturally occurring distortions. In this dissertation, we propose approaches to mitigate the effect of such distortions and in the process, study some vulnerabilities of deep networks to small imperceptible changes that occur in the given input. The research problems that comprise this dissertation lie in the cross section of two open topics: (1) Studying and developing methods that enable neural networks learn robust representations (2) Improving generalization of neural nets across domains. The first part of the dissertation approaches the problem of robustness from two broad viewpoints: Robustness to external nuisance factors that occur in the data and robustness (or a lack thereof) to perturbations of the learned feature space. In the second part, we focus on learning representations that are invariant to external covariate shift, which is more commonly termed as domain shift. Towards learning representations robust to external nuisance factors, we propose an approach that couples a deep convolutional neural network with a low-dimensional discriminative embedding learned using triplet probability constraints to solve the unconstrained face analysis problem. While previous approaches in this area have proposed scalable yet ad-hoc solutions to this problem, we propose a principled and parameter free formulation which is based on maximum likelihood estimation. In addition, we employ the principle of transfer learning to realize a deep network architecture that can train faster and on lesser data yet significantly outperforms existing approaches on the unconstrained face verification task. We demonstrate the robustness of the approach to challenges including age, pose, blur and clutter by performing clustering experiments on challenging benchmarks. Recent seminal works have shown that deep neural networks are susceptible to visually imperceptible perturbations of the input. In this dissertation, we build on their ideas in two unique ways: (a) We show that neural networks that perform pixel-wise semantic segmentation tasks also suffer from this vulnerability, despite being trained with more extra information compares to simple classification tasks. In addition, we present a novel self correcting mechanism in segmentation networks and provide an efficient way to generate such perturbations (b) We present a novel approach to regularize deep neural networks by perturbing intermediate layer activations in an efficient manner, thereby exploring the trade-off between conventional regularization and adversarial robustness within the context of very deep networks. Both of these works provide interesting directions towards understanding the secure nature of deep learning algorithms. While humans find it extremely simple to generalize their knowledge across domains, machine learning algorithms including deep neural networks suffer from the problem of domain shift across what are commonly termed as 'source' (S) and 'target' (T) distributions. Let the data that a learning algorithm is trained on be sampled from S. If the real data used to evaluate the model is then sampled from T, then the learnt model under-performs on the target data. This inability to generalize is characterized as domain shift. Our attempt to address this problem involves learning a common feature subspace, where distance between source and target distributions are minimized. Estimating the distance between different domains is highly non-trivial and is an open research problem in itself. In our approach we parameterize the distance measure by using a Generative Adversarial Network (GAN). A GAN involves a two player game between two mappings com- monly termed as generator and discriminator. These mappings are learned simultaneously by employing an adversarial game, i.e. by letting the generator fool the discriminator and enabling the discriminator to outperform the generator. This adversarial game can be formulated as a minimax problem. In our approach, we learn three mappings simultaneously: the generator, discriminator and a feature mapping that contains information about both the content and the domain of the input. We deploy a two-level minimax game, where the first level is a competition between the generator and a discriminator similar to a GAN; the second level game is where the feature mapping attempts to fool the discriminator thereby introducing domain invariance in the learned feature representation. We have extensively evaluated this approach for different tasks such as object classification and semantic segmentation, where we achieve state of the art results across several real datasets. In addition to the conceptual novelty, our approach presents a more efficient and scalable solution compared to other approaches that attempt to solve the same problem. In the final part of this dissertation, we describe some ongoing efforts and future directions of research. Inspired from the study of perturbations described above, we propose a novel metric on how to effectively choose pixels to label given an image, for a pixel-wise segmentation task. This has the potential to significantly reduce the labeling effort and our preliminary results for the task of semantic segmentation are encouraging. While the domain adaptation approach proposed above considered static images, we propose an extension to video data aided by the use of recurrent neural networks. Use of full temporal information, when available, provides the perceptual system additional context to disambiguate among smaller object classes that commonly occur in real scenes.
  • Item
    OPTIMIZATION ALGORITHMS USING PRIORS IN COMPUTER VISION
    (2018) Shah, Sohil Atul; Goldstein, Tom; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Over the years, many computer vision models, some inspired by human behavior, have been developed for various applications. However, only handful of them are popular and widely used. Why? There are two major factors: 1) most of these models do not have any efficient numerical algorithm and hence they are computationally very expensive; 2) many models, being too generic, cannot capitalize on problem specific prior information and thus demand rigorous hyper-parameter tuning. In this dissertation, we design fast and efficient algorithms to leverage application specific priors to solve unsupervised and weakly-supervised problems. Specifically, we focus on developing algorithms to impose structured priors, model priors and label priors during the inference and/or learning of vision models. In many application, it is known a priori that a signal is smooth and continuous in space. The first part of this work is focussed on improving unsupervised learning mechanisms by explicitly imposing these structured priors in an optimization framework using different regularization schemes. This led to the development of fast algorithms for robust recovery of signals from compressed measurements, image denoising and data clustering. Moreover, by employing re-descending robust penalty on the structured regularization terms and applying duality, we reduce our clustering formulation to an optimization of a single continuous objective. This enabled integration of clustering processes in an end-to-end feature learning pipeline. In the second part of our work, we exploit inherent properties of established models to develop efficient solvers for SDP, GAN, and semantic segmentation. We consider models for several different problem classes. a) Certain non-convex models in computer vision (e.g., BQP) are popularly solved using convex SDPs after lifting to a high-dimensional space. However, this computationally expensive approach limits these methods to small matrices. A fast and approximate algorithm is developed that directly solves the original non-convex formulation using biconvex relaxations and known rank information. b) Widely popular adversarial networks are difficult to train as they suffer from instability issues. This is because optimizing adversarial networks corresponds to finding a saddle-point of a loss function. We propose a simple prediction method that enables faster training of various adversarial networks using larger learning rates without any instability problems. c) Semantic segmentation models must learn long-distance contextual information while retaining high spatial resolution at the output. Existing models achieves this at the cost of computationally expensive and memory exhaustive training/inference. We designed stacked u-nets model which can repeatedly process top-down and bottom-up features. Our smallest model exceeds Resnet-101 performance on PASCAL VOC 2012 by 4.5% IoU with ∼ 7× fewer parameters. Next, we address the problem of learning heterogeneous concepts from internet videos using mined label tags. Given a large number of videos each with multiple concepts and labels, the idea is to teach machines to automatically learn these concepts by leveraging weak labels. We formulate this into a co-clustering problem and developed a novel bayesian non-parametric weakly supervised Indian buffet process model which additionally enforces the paired label prior between concepts. In the final part of this work we consider an inverse approach: learning data priors from a given model. Specifically, we develop numerically efficient algorithm for estimating the log likelihood of data samples from GANs. The approximate log-likelihood function is used for outlier detection and data augmentation for training classifiers.