Electrical & Computer Engineering
Permanent URI for this communityhttp://hdl.handle.net/1903/2234
Browse
4 results
Search Results
Item Deep Learning with Constraints and Priors for Improved Subject Clustering, Medical Imaging, and Robust Inference(2020) Lin, Wei-An; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Deep neural networks (DNNs) have achieved significant success in several fields including computer vision, natural language processing, and robot control. The common philosophy behind these success is the use of large amount of annotated data and end-to-end networks with task-specific constraints and priors implicitly incorporated into the trained model without the need for careful feature engineering. However, DNNs are shown to be vulnerable to distribution shifts and adversarial perturbations, which indicates that such implicit priors and constraints are not sufficient for real world applications. In this dissertation, we target three applications and design task-specific constraints and priors for improved performance of deep neural networks. We first study the problem of subject clustering, the task of grouping face images of the same person together. We propose to utilize the prior structure in the feature space of DNNs trained for face identification to design a novel clustering algorithm. Specifically, the clustering algorithm exploits the local neighborhood structure of deep representations by exemplar-based learning based on k-nearest neighbors (k-NN). Extensive experiments show promising results for grouping face images according to subject identity. As an example, we apply the proposed clustering algorithm to automatically curate a large-scale face dataset with noisy labels and show that the performance of face recognition DNNs can be significantly improved by training on the curated dataset. Furthermore, we empirically find that the k-NN rule does not capture proper local structures for deep representations when each subject has very few face images. We then propose to improve upon the exemplar-based approach by a density-aware similarity measure and theoretically show its asymptotic convergence to a density estimator. We conduct experiments on challenging face datasets that show promising results. Second, we study the problem of metal artifact reduction in computed tomography (CT). Unlike typical image restoration tasks such as super-resolution and denoising, metal artifacts in CT images are structured and non-local. Conventional DNNs do not generalize well when metal implants with unseen shapes are presented. We find that the imaging process of CT induces a data consistency prior that can be exploited for image enhancement. Based on this observation, we propose a dual-domain learning approach to CT metal artifact reduction. We design and implement a novel Radon inversion layer that allows gradients in the image domain to be backpropagated to the projection domain. Experiments conducted on both simulated datasets and clinical datasets show promising results. Compared to conventional DNN-based models, the proposed dual-domain approach leads to impressive metal artifact reduction and has improved generalization capability. Finally, we study the problem of robust classification. In the past few years, the vulnerability of DNNs to small imperceptible perturbations has been widely studied, which raises concerns about the security and robustness of DNNs against possible threat models. To defend against threat models, Samangoui et al. proposed DefenseGAN, a preprocessing approach which removes adversarial perturbations by projecting the input images onto the learned data prior. However, the projection operation in DefenseGAN is time-consuming and may not yield proper reconstruction when images have complicated textures. We propose an inversion network to constrain the initial estimates of the latent code for input images. With the proposed constraint, the number of optimization steps in DefenseGAN can be reduced while achieving improved accuracy and robustness. Furthermore, we conduct empirical studies on attack methods that have claimed to break DefenseGAN, which shows that on-manifold robustness might be the key factor for ensuring adversarial robustness.Item OPTIMIZATION ALGORITHMS USING PRIORS IN COMPUTER VISION(2018) Shah, Sohil Atul; Goldstein, Tom; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Over the years, many computer vision models, some inspired by human behavior, have been developed for various applications. However, only handful of them are popular and widely used. Why? There are two major factors: 1) most of these models do not have any efficient numerical algorithm and hence they are computationally very expensive; 2) many models, being too generic, cannot capitalize on problem specific prior information and thus demand rigorous hyper-parameter tuning. In this dissertation, we design fast and efficient algorithms to leverage application specific priors to solve unsupervised and weakly-supervised problems. Specifically, we focus on developing algorithms to impose structured priors, model priors and label priors during the inference and/or learning of vision models. In many application, it is known a priori that a signal is smooth and continuous in space. The first part of this work is focussed on improving unsupervised learning mechanisms by explicitly imposing these structured priors in an optimization framework using different regularization schemes. This led to the development of fast algorithms for robust recovery of signals from compressed measurements, image denoising and data clustering. Moreover, by employing re-descending robust penalty on the structured regularization terms and applying duality, we reduce our clustering formulation to an optimization of a single continuous objective. This enabled integration of clustering processes in an end-to-end feature learning pipeline. In the second part of our work, we exploit inherent properties of established models to develop efficient solvers for SDP, GAN, and semantic segmentation. We consider models for several different problem classes. a) Certain non-convex models in computer vision (e.g., BQP) are popularly solved using convex SDPs after lifting to a high-dimensional space. However, this computationally expensive approach limits these methods to small matrices. A fast and approximate algorithm is developed that directly solves the original non-convex formulation using biconvex relaxations and known rank information. b) Widely popular adversarial networks are difficult to train as they suffer from instability issues. This is because optimizing adversarial networks corresponds to finding a saddle-point of a loss function. We propose a simple prediction method that enables faster training of various adversarial networks using larger learning rates without any instability problems. c) Semantic segmentation models must learn long-distance contextual information while retaining high spatial resolution at the output. Existing models achieves this at the cost of computationally expensive and memory exhaustive training/inference. We designed stacked u-nets model which can repeatedly process top-down and bottom-up features. Our smallest model exceeds Resnet-101 performance on PASCAL VOC 2012 by 4.5% IoU with ∼ 7× fewer parameters. Next, we address the problem of learning heterogeneous concepts from internet videos using mined label tags. Given a large number of videos each with multiple concepts and labels, the idea is to teach machines to automatically learn these concepts by leveraging weak labels. We formulate this into a co-clustering problem and developed a novel bayesian non-parametric weakly supervised Indian buffet process model which additionally enforces the paired label prior between concepts. In the final part of this work we consider an inverse approach: learning data priors from a given model. Specifically, we develop numerically efficient algorithm for estimating the log likelihood of data samples from GANs. The approximate log-likelihood function is used for outlier detection and data augmentation for training classifiers.Item DISCRIMINATIVE LEARNING AND RECOGNITION USING DICTIONARIES(2013) Chen, Yi-Chen; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In recent years, the theory of sparse representation has emerged as a powerful tool for efficient processing of data in non-traditional ways. This is mainly due to the fact that most signals and images of interest tend to be sparse or compressible in some dictionary. In other words, they can be well approximated by a linear combination of a few elements (also known as atoms) of a dictionary. This dictionary can either be an analytic dictionary composed of wavelets or Fourier basis or it can be directly trained from data. It has been observed that dictionaries learned directly from data provide better representation and hence can improve the performance of many practical applications such as restoration and classification. In this dissertation, we study dictionary learning and recognition under supervised, unsupervised, and semi-supervised settings. In the supervised case, we propose an approach to recognize humans in unconstrained videos, where the main challenge is exploiting the identity information in multiple frames and the accompanying dynamic signature. These identity cues include face, body, and motion. Our approach is based on video-dictionaries for face and body. We design video-dictionaries to implicitly encode temporal, pose, and illumination information. Next, we propose a novel multivariate sparse representation method that jointly represents all the video data by a sparse linear combination of training data. To increase the ability of our algorithm to learn nonlinearities, we apply kernel methods to learn the dictionaries. Next, we address the problem of matching faces across changes in pose in unconstrained videos. Our approach consists of two methods based on 3D rotation and sparse representation that compensate for changes in pose. We demonstrate the superior performance of our approach over several state-of-the-art algorithms through extensive experiments on unconstrained video datasets. In the unsupervised case, we present an approach that simultaneously clusters images and learns dictionaries from the clusters. The method learns dictionaries in the Radon transform domain. The main feature of the proposed approach is that it provides in-plane rotation and scale invariant clustering, which is useful in many applications such as Content Based Image Retrieval (CBIR). We demonstrate through experiments that the proposed rotation and scale invariant clustering provides not only good retrieval performances but also substantial improvements and robustness compared to traditional Gabor-based and several state-of-the-art shape-based methods. We then extend the dictionary learning problem to a generalized semi-supervised formulation, where each training sample is provided with a set of possible labels and only one label among them is the true one. Such applications can be found in image and video collections where one often has only partially labeled data. For instance, given an image with multiple faces and a caption specifying the names, we can be sure that each of the faces belong to one of the names specified, while the exact identity of each face is not known. Labeling involves significant amount of human effort and is expensive. This has motivated researchers to develop learning algorithms from partially labeled training data. In this work, we develop dictionary learning algorithms that utilize such partially labeled data. The proposed method aims to solve the problem of ambiguously labeled multiclass-classification using an iterative algorithm. The dictionaries are updated using either soft (EM-based) or hard decision rules. Extensive evaluations on existing datasets demonstrate that the proposed method performs significantly better than state-of-the-art approaches for learning from ambiguously labeled data. As sparsity plays a major role in our research, we further present a sparse representation-based approach to find the salient views of 3D objects. The salient views are categorized into two groups. The first are boundary representative views that have several visible sides and object surfaces that may be attractive to humans. The second are side representative views that best represent side views of the approximating convex shape. The side representative views are class-specific views and possess the most representative power compared to other within-class views. Using the concept of characteristic view class, we first present a sparse representation-based approach for estimating the boundary representative views. With the estimated boundaries, we determine the side representative views based on a minimum reconstruction error criterion. Furthermore, to evaluate our method, we introduce the notion of geometric dictionaries built from salient views for applications in 3D object recognition, retrieval and sparse-to-full reconstruction. By a series of experiments on four publicly available 3D object datasets, we demonstrate the effectiveness of our approach over state-of-the-art algorithms and baseline methods.Item Identification of Air Traffic Flow Segments via Incremental Deterministic Annealing Clustering(2012) Nguyen, Alex T; Barras, John S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Many of the traffic management decisions and initiatives in air traffic are based on "flows" of traffic in the National Airspace System (NAS), but the actual identification of the location and time of the flow segments are often left to interpretation based on observations of traffic data points over time. Having an automated method of identifying major flow segments can help to target traffic management initiatives, evaluate design of airspace, and enable actions to be taken on the collection of flights in a flow segment rather than on the flights individually. A novel approach is developed to identify the major flow segments of air traffic in the NAS that consists of a robust method for partitioning 4-dimensional traffic trajectories into a series of great circle segments, and clustering the segments using an Agglomerate Deterministic Annealing clustering algorithm. In addition, a very efficient algorithm to incrementally cluster the segments is developed that takes into account the spatial and temporal properties of the segments, and makes the method very suitable for real-time applications. Further, an enhancement to the algorithm is provided that requires only a small subset of the segments to be clustered, drastically reducing the run time. Results of the clustering technique are shown, highlighting various major traffic flow patterns in the NAS. In addition, organizing the traffic into the flow segments identified using the Incremental Clustering method is shown to have a potential reduction in the number of conflict points. An application of the flow information is presented in the form of a Decision Support Tool (DST) that aids traffic managers in establishing and managing Airspace Flow Programs. In addition, the flow segment information is applied to a low-level form of aggregated traffic management, showing that aggregating flights into the flow segments and rerouting the whole flow segment can be efficiently performed as compared to rerouting individual aircraft separately, and can reduce the number of conflict points. Considerations for implementing these techniques in real-time systems are also discussed.