Electrical & Computer Engineering

Permanent URI for this communityhttp://hdl.handle.net/1903/2234

Browse

Search Results

Now showing 1 - 10 of 331
  • Item
    ROBUST FACIAL LANDMARKS LOCALIZATION WITH APPLICATIONS IN FACIAL BIOMETRICS
    (2019) Kumar, Amit; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Localization of regions of interest on images and videos is a well studied prob- lem in computer vision community. Usually localization tasks imply localization of objects in a given image, such as detection and segmentation of objects in images. However, the regions of interests can be limited to a single pixel as in the task of facial landmark localization or human pose estimation. This dissertation studies ro- bust facial landmark detection algorithms for faces in the wild using learning methods based on Convolution Neural Networks. Detection of specific keypoints on face images is an integral pre-processing step in facial biometrics and numerous other applications including face verification and identification. Detecting keypoints allows to align face images to a canonical coordi- nate system using geometric transforms such as similarity or affine transformations mitigating the adverse affects of rotation and scaling. This challenging problem has become more attractive in recent years as a result of advances in deep learning and release of more unconstrained datasets. The research community is pushing bound-aries to achieve better and better performance on unconstrained images, where the images are diverse in pose, expression and lightning conditions. Over the years, researchers have developed various hand crafted techniques to extract meaningful features from features, most of them being appearance and geometry-based features. However, these features do not perform well for data col- lected in unconstrained settings due to large variations in appearance and other nui- sance factors. Convolution Neural Networks (CNNs) have become prominent because of their ability to extract discriminating features. Unlike the hand crafted features, DCNNs perform feature extraction and feature classification from the data itself in an end-to-end fashion. This enables the DCNNs to be robust to variations present in the data and at the same time improve their discriminative ability. In this dissertation, we discuss three different methods for facial keypoint de- tection based on Convolution Neural Networks. The methods are generic and can be extended to a related problem of keypoint detection for human pose estimation. The first method called Cascaded Local Deep Descriptor Regression uses deep features ex- tracted around local points to learn linear regressors for incrementally correcting the initial estimate of the keypoints. In the second method, called KEPLER, we develop efficient Heatmap CNNs to directly learn the non-linear mapping between the input and target spaces. We also apply different regularization techniques to tackle the effects of imbalanced data and vanishing gradients. In the third method, we model the spatial correlation between different keypoints using Pose Conditioned Convo- lution Deconvolution Networks (PCD-CNN) while at the same time making it pose agnostic by disentangling pose from the face image. Next, we show an applicationof facial landmark localization used to align the face images for the task of apparent age estimation of humans from unconstrained images. In the fourth part of this dissertation we discuss the impact of good quality landmarks on the task of face verification. Previously proposed methods perform with reasonable accuracy on high resolution and good quality images, but fail when the input image suffers from degradation. To this end, we propose a semi-supervised method which aims at predicting landmarks in the low quality images. This method learns to predict landmarks in low resolution images by learning to model the learning process of high resolution images. In this algorithm, we use Generative Adversarial Networks, which first learn to model the distribution of real low resolution images after which another CNN learns to model the distribution of heatmaps on the images. Additionally, we also propose another high quality facial landmark detection method, which is currently state of the art. Finally, we also discuss the extension of ideas developed for facial keypoint localization for the task of human pose estimation, which is one of the important cues for Human Activity Recognition. As in PCD-CNN, the parts of human body can also be modelled in a tree structure, where the relationship between these parts are learnt through convolutions while being conditioned on the 3D pose and orientation. Another interesting avenue for research is extending facial landmark localization to naturally degraded images.
  • Item
    Nighttime Photovoltaic Cells: Electrical power generation by optically couping with deep space
    (2019) Deppe, Tristan; Munday, Jeremy N; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Photovoltaics possess significant potential due to the abundance of solar power incident on earth; however, they can only generate electricity during daylight hours. In order to produce electrical power after the sun has set, we consider an alternative photovoltaic concept that uses the earth as a heat source and the night sky as a heat sink, resulting in a “nighttime photovoltaic cell” that employs thermoradiative photovoltaics and radiative cooling to output as much as 10 W/m^2 from ambient radiation. This thesis will discuss the principles of thermoradiative photovoltaics, the theoretical limits of coupling a device with deep space, the potential of advanced radiative cooling techniques to enhance their performance, and a discussion of the practical limits, scalability, and integrability of this nighttime photovoltaic concept.
  • Item
    Planning, Monitoring and Learning with Safety and Temporal Constraints for Robotic Systems
    (2019) Lin, Zhenyu; Baras, John S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we address the problem of planning, monitoring and learning in robotic systems, while considering the safety and time constraints. Motion and action planning for robotic systems is important for real, physical world applications. Robots are capable of performing repetitive tasks at speeds and accuracies that far exceed those of human operators and are widely used in manufacturing, medical fields and even transportation. Planning commonly refers to a process of converting high-level task specifications into low-level control commands that can be executed on the system of interest. Time behavior is a most important issue for the autonomous systems of interest, and it is critical for many robotic tasks. Most state of the art methods, however, are not capable of providing the framework needed for the autonomous systems to plan under finite time constraints. Safety and time constraints are two important aspects for the plan. We are interested in the safety of the plan, such as ``Can the robot reach the goal without collision?''. We are also interested in the time constraints for the plan, such as ``Can the robot finish this task after 3 minutes but no later than 5 minutes?''. These type of tasks are important to understand in robot search and rescue or cooperative robotic production line. In this thesis, we address these problems by two different approaches, the first one is a timed automata based approach, which focuses on a more high-level, abstracted result with less computational requirement. The other one involves converting the problem into a mixed integer linear programming (MILP) with more low-level control details but requires higher computational power. Both methods are able to automatically generate a plan that are guaranteed to be correct. The robotic systems may behave differently in runtime and not able to execute the task perfectly as planned. Given that a robotic system is naturally cyber-physical, and malfunctions can have safety consequences, monitoring the system’s behavior at runtime can be key to safe operation. Therefore, it is important to consider both time and space tolerances during the planning phase, and also design runtime monitors for error detection and possible self-correction. We provide an optimization-based formulation which takes the tolerances into account, and we have designed runtime monitors to monitor the status of the systems, as well as an event-triggered model predictive controller for self-correction. Learning is another very important aspect for the robotics field. We hope to only provide the robot with high-level task specifications, and the robot learns to accomplish the task. Thus, in the next part of this thesis, we discussed how the robot could learn to accomplish task specified by metric interval temporal logic, and how the robot could replan and self-correct if the initial plan fails to execute correctly. As the field of robotics is expanding from the fixed environment of a production line to complex human environments, robots are required to perform increasingly human-like manipulation tasks. Thus, for the last aspect of the thesis, we considered a manipulation task with dexterous robotic hand - Shadow Hand. We collected the multimodal haptic-vision dataset, and proposed the framework of self-assurance slippage detection and correction. We provided the simulation and also real-world implementation with a UR10 and Shadowhand robotic system.
  • Item
    Photodetection using ultrathin metal films
    (2019) Krayer, Lisa J.; Munday, Jeremy N; Goldsman, Neil; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Silicon is the most widely used material for visible photodetection, with extensive applications in both consumer and industrial products. Further, its excellent optoelectronic properties and natural abundance have made it nearly ideal for microelectronic devices and solar cells. However, silicon's lack of absorption in the infrared limits its use in infrared detectors and imaging sensors, severely constraining its implementation in telecommunications for low-cost integrated optical circuitry. In this thesis, we show that this limitation can be overcome by exploiting resonant absorption in ultrathin metal films (< 20 nm). Our approach paves the way to implement scalable, lithography-free, and low-cost silicon-based optoelectronics beyond the material bandgap. Light absorption in metal films can excite hot carriers, which are useful for photodetection, solar energy conversion, and many other applications. However, metals are highly reflective, and therefore, careful optical design is required to achieve high absorption in these films. Through appropriate optical design, we achieved a Fabry-Perot-like resonance in ultrathin metal films deposited on a semiconductor enabling> 70% light absorption below the bandgap of the semiconductor. We experimentally demonstrate this phenomenon with four ultrathin planar metal films: Pt, Fe, Cr, and Ti. These metals were chosen to satisfy the resonant condition for high absorption over a wide range of wavelengths, and with these designs we realize a near-infrared imaging detector. In addition, we utilize an index-near-zero (INZ) substrate to further improve the absorption to near-unity. By employing aluminum-doped zinc oxide (AZO) as the INZ medium in the near-infrared range, we enhance the metal film absorption by nearly a factor of 2. To exploit this absorption enhancement in an optoelectronic device, we fabricate a Schottky photodiode with a Pt film on Si and and that the photocurrent generated in the photodiode is enhanced by > 80% with the INZ substrate. The enhancement arises from a combination of improved carrier generation and carrier transport resulting from the addition of the AZO film. Finally, we explore the tunability of material properties through alloying metals. Alloying of metals provides a vast parameter space for tuning of material, chemical, and mechanical properties, impacting disciplines ranging from photonics and catalysis to aerospace. We demonstrate that AgAu alloys provide an ideal model system for controlling the optical and electrical responses in ultrathin metal films for hot carrier photodetectors with improved performance. While pure Ag and Au have long hot carrier attenuation lengths > 20 nm, their optical absorption is insufficient for high efficiency devices. We and that alloying Ag and Au enhances the absorption by ~50% while maintaining attenuation lengths > 15 nm, currently limited by grain boundary scattering. Further, our density functional theory analysis shows that the addition of small amounts of Au to the Ag lattice significantly enhances the hot hole generation rate.
  • Item
    Deep Neural Networks for Radio Frequency Fingerprinting
    (2019) Merchant, Kevin; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As the Internet of Things (IoT) continues to expand, there is a growing necessity for improved techniques to authenticate the identity of wireless transmitters to prevent unauthorized network access. In this dissertation, we develop a series of physical-layer authentication, or radio frequency (RF) fingerprinting, techniques which utilize methods from deep learning to train convolutional and recurrent neural network models to verify the identity of wireless transmitters which meet the IEEE 802.15.4 standard. First, we develop a technique which utilizes a convolutional neural network (CNN) to identify or verify the identity of a transmitter from which a time-domain complex baseband signal was recorded. This technique relies on an extensive pre-processing sequence to remove sources of potential bias and trivial features from the received waveforms, and derives an estimated error signal from each recording from which the CNN learns discriminatory features. We demonstrate the effectiveness of the technique on a set of seven off-the-shelf ZigBee devices recorded outside in an urban environment, as well as in a laboratory environment with artificial noise over a wide-range of signal-to-noise ratios (SNRs). Next, we train a series of models which utilize both convolutional and recurrent elements to improve the performance of the previous technique in the presence of high levels of noise and expand the evaluation to a larger set of twenty-five devices. We evaluate several realistic scenarios, including the performance in typical multipath environments and the ability to correctly reject previously unseen devices. In order to justify the proposed pre-processing sequence, we present experimental results that demonstrate weaknesses in fingerprint verification classifiers in which frequency synchronization is not performed. Finally, we present a simple technique to reduce the amount of memory required for a collection of fingerprint models by up to 95% without loss of performance. To further enhance the security of the trained fingerprint models, we propose a generative adversarial network (GAN) architecture and training procedure to provide additional training examples for the classifiers. We show that fingerprint classifiers that are trained exclusively on real devices cannot reliably reject GAN-generated signals. Furthermore, we illustrate that augmenting the training process of the fingerprint models with GAN-generated signals reduces this vulnerability, even if the GAN used for training and inference are different. Finally, we assess the practicality of transferring an RF fingerprint model from one receiver to another. Experimentally, we demonstrate significant degradation in classification performance when a fingerprint model is learned using signals recorded on one receiver and evaluated using signals recorded on another receiver. First, we show that generalization may be improved by including multiple receivers in the training process. Then, we develop a calibration procedure whereby models learned on a single receiver can be transferred without alteration to another receiver by learning a transformation function, implemented as a residual neural network, to model the variations between the two receivers. We perform several experiments with ten commercial receivers to confirm the effectiveness of the technique under realistic constraints.
  • Item
    The Role of Information in Multi-Agent Decision Making
    (2019) Raghavan, Aneesh; Baras, John S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Networked multi-agent systems have become an integral part of many engineering systems. Collaborative decision making in multi-agent systems poses many challenges. In this thesis, we study the impact of information and its availability to agents on collaborative decision making in multi-agent systems. We consider the problem of detecting Markov and Gaussian models from observed data using two observers. We consider two Markov chains and two observers. Each observer observes a different function of the state of the true unknown Markov chain. Given the observations, the aim is to find which of the two Markov chains has generated the observations. We formulate block binary hypothesis testing problem for each observer and show that the decision for each observer is a function of the local likelihood ratio. We present a consensus scheme for the observers to agree on their beliefs and the asymptotic convergence of the consensus decision to the true hypothesis is proven. A similar problem framework is considered for the detection of Gaussian models using two observers. Sequential hypothesis testing problem is formulated for each observer and solved using local likelihood ratio. We present a consensus scheme taking into account the random and asymmetric stopping time of the observers. The notion of ``value of information" is introduced to understand the ``usefulness" of the information exchanged to achieve consensus. Next, we consider the binary hypothesis testing problem with two observers. There are two possible states of nature. There are two observers which collect observations that are statistically related to the true state of nature. The two observers are assumed to be synchronous. Given the observations, the objective of the observers is to collaboratively find the true state of nature. We consider centralized and decentralized approaches to solve the problem. In each approach there are two phases: (1) probability space construction: the true hypothesis is known, observations are collected to build empirical joint distributions between hypothesis and the observations; (2) given a new set of observations, hypothesis testing problems are formulated for the observers to find their individual beliefs about the true hypothesis. Consensus schemes for the observers to agree on their beliefs about the true hypothesis are presented. The rate of decay of the probability of error in the centralized approach and rate of decay of the probability of agreement on the wrong belief in the decentralized approach are compared. Numerical results comparing the centralized and decentralized approaches are presented. All propositions from the set of events for an agent in a multi-agent system might not be simultaneously verifiable. We study the concepts of \textit{event-state-operation structure} and \textit{relationship of incompatibility} from literature and use them as a tool to study the structure of the set of events. We present an example from multi-agent hypothesis testing where the set of events do not form a boolean algebra, but form an ortholattice. A possible construction of a 'noncommutative probability space', accounting for \textit{incompatible events} (events which cannot be simultaneously verified) is discussed. As a possible decision-making problem in such a probability space, we consider the binary hypothesis testing problem. We present two approaches to this decision-making problem. In the first approach, we represent the available data as coming from measurements modeled via projection valued measures (PVM) and retrieve the results of the underlying detection problem solved using classical probability models. In the second approach, we represent the measurements using positive operator valued measures (POVM). We prove that the minimum probability of error achieved in the second approach is the same as in the first approach. Finally, we consider the binary hypothesis testing problem with learning of empirical distributions. The true distributions of the observations under either hypothesis are unknown. Empirical distributions are estimated from observations. A sequence of detection problems is solved using the sequence of empirical distributions. The convergence of the information state and optimal detection cost under empirical distributions to the information state and optimal detection cost under the true distribution are shown. Numerical results on the convergence of optimal detection cost are presented.
  • Item
    MULTI-FEATURE ANALYSIS OF EEG SIGNAL ON SEIZURE PATTERNS AND DEEP NEURAL STRUCTURES FOR PREDICTION OF EPILEPTIC SEIZURES
    (2019) Ma, Xinyuan; Newcomb, Robert W; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This work investigates EEG signal processing and seizure prediction based on deep learning architectures. The research includes two major parts. In the first part, we use wavelet decomposition to process the signals and extract signal features from the time-frequency bands. The second part examines the machine learning model and deep learning architecture we have developed for seizure pattern analysis. In our design, the extracted feature maps are processed as image inputs into our convolutional neural network (CNN) model. We proposed a combined CNN-LSTM model to directly process the EEG signals with layers functioning as feature extractors. In cross-validation testing, our CNN feature model can reach an accuracy of 96% and our CNN-LSTM model could reach an accuracy of 98%. We also proposed a matching network architecture that employs two parallel multilayer channels to improve sensitivity.
  • Item
    SURFACE ACOUSTIC WAVE (SAW) PROPAGATION IN NANOSTRUCTURED DEVICES
    (2019) Xu, Kezhen; Iliadis, Agis; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    ZnO/SiO2/Si surface acoustic wave Love mode sensors are considered to be promising high sensitivity sensors. Previous research has tested ZnO/SiO2/Si SAW sensors with selected operating frequency and guiding layer thickness. This investigation is based on experimental data of previous research and used the theories and equations from that research to evaluate and develop a model of the mass sensitivity of surface acoustic wave (SAW) devices with two different piezoelectric semiconductors, ZnO/SiO2/Si and GaN/SiO2/Si Love mode SAW sensors. The SAW mass sensitivity model developed here, examined the mass sensitivity of the SAW device with respect to the design parameters, like wavelength, piezoelectric layer thickness, and the two different semiconductors (ZnO, and GaN) to obtain optimum mass sensitivity. The mass sensitivity increases when the wavelength is increasing. The model also shows that the maximum mass sensitivity of GaN-based devices is 10% better than the maximum mass sensitivity of ZnO-based devices.
  • Item
    Decoding auditory brain responses with mutual information and the effects of aging
    (2019) Zan, Peng; Simon, Jonathan Z.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The ability to segregate and understand speech in complex listening scenarios is an inherent property of the human brain. However, this ability deteriorates as the brain ages. The underlying age-related alteration of neural mechanisms is still unclear. Understanding the subcortical and cortical neural mechanisms of auditory processes might be critical in order to get a better understanding of how they degraded by age. Importantly, the likely non-linearity nature of these auditory processes may conceal important internal mechanisms that might not be captured with traditional linear methodology. This thesis develops a novel non-linear approach based on information theory and investigates the non-linear representation of speech in both the midbrain and the cortex. In this dissertation, midbrain and cortical activities from younger and older listeners are noninvasively recorded with both clean speech (i.e. subjects listening to a single speaker) and with adverse listening conditions (i.e. two competing speakers). Additionally, the effect of informational masking is also investigated. Results from the mutual information analysis suggest an age-related deterioration of the response in the midbrain and a strong effect of the informational masking only in older adults. Conversely, the cortical analysis reveals an exaggerated response in older listeners. Interestingly, this exaggerated response is strongly correlated with behavioral measurements, such as speech-in-noise score and behavioral inhibitory control score. Further analysis also reveals that the exaggerated response in the aging cortex manifests only in the neural representation of the low-frequency speech envelope, while at higher frequencies (60-100 Hz) no differences were seen between younger and older listeners. However, the aging cortex demonstrates neural deficits, at such higher frequency, in suppression of the competing speech in challenging listening conditions, shown by an increasing trend of response level with increasing sound level of the competing speech. In summary, this dissertation develops a novel mutual information approach for analyzing neural recordings, and the results reveal new findings of age-related changes in auditory midbrain and cortical activities.
  • Item
    Augmented Deep Representations for Unconstrained Still/Video-based Face Recognition
    (2019) Zheng, Jingxiao; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Face recognition is one of the active areas of research in computer vision and biometrics. Many approaches have been proposed in the literature that demonstrate impressive performance, especially those based on deep learning. However, unconstrained face recognition with large pose, illumination, occlusion and other variations is still an unsolved problem. Unconstrained video-based face recognition is even more challenging due to the large volume of data to be processed, lack of labeled training data and significant intra/inter-video variations on scene, blur, video quality, etc. Although Deep Convolutional Neural Networks (DCNNs) have provided discriminant representations for faces and achieved performance surpassing humans in controlled scenarios, modifications are necessary for face recognition in unconstrained conditions. In this dissertation, we propose several methods that improve unconstrained face recognition performance by augmenting the representation provided by the deep networks using correlation or contextual information in the data. For unconstrained still face recognition, we present an encoding approach to combine the Fisher vector (FV) encoding and DCNN representations, which is called FV-DCNN. The feature maps from the last convolutional layer in the deep network are encoded by FV into a robust representation, which utilizes the correlation between facial parts within each face. A VLAD-based encoding method called VLAD-DCNN is also proposed as an extension. Extensive evaluations on three challenging face recognition datasets show that the proposed FV-DCNN and VLAD-DCNN perform comparable to or better than many state-of-the-art face verification methods. For the more challenging video-based face recognition task, we first propose an automatic system and model the video-to-video similarity as subspace-to-subspace similarity, where the subspaces characterize the correlation between deep representations of faces in videos. In the system, a quality-aware subspace-to-subspace similarity is introduced, where subspaces are learned using quality-aware principal component analysis. Subspaces along with quality-aware exemplars of templates are used to produce the similarity scores between video pairs by a quality-aware principal angle-based subspace-to-subspace similarity metric. The method is evaluated on four video datasets. The experimental results demonstrate the superior performance of the proposed method. To utilize the temporal information in videos, a hybrid dictionary learning method is also proposed for video-based face recognition. The proposed unsupervised approach effectively models the temporal correlation between deep representations of video faces using dynamical dictionaries. A practical iterative optimization algorithm is introduced to learn the dynamical dictionary. Experiments on three video-based face recognition datasets demonstrate that the proposed method can effectively learn robust and discriminative representation for videos and improve the face recognition performance. Finally, to leverage contextual information in videos, we present the Uncertainty-Gated Graph (UGG) for unconstrained video-based face recognition. It utilizes contextual information between faces by conducting graph-based identity propagation between sample tracklets, where identity information are initialized by the deep representations of video faces. UGG explicitly models the uncertainty of the contextual connections between tracklets by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results on the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark datasets.