Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Item
    Machine Learning of Facial Attributes Using Explainable, Secure and Generative Adversarial Networks
    (2018) Samangouei, Pouya; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    "Attributes" are referred to abstractions that humans use to group entities and phenomena that have a common characteristic. In machine learning (ML), attributes are fundamental because they bridge the semantic gap between humans and ML systems. Thus, researchers have been using this concept to transform complicated ML systems into interactive ones. However, training the attribute detectors which are central to attribute-based ML systems can still be challenging. It might be infeasible to gather attribute labels for rare combinations to cover all the corner cases, which can result in weak detectors. Also, it is not clear how to fill in the semantic gap with attribute detectors themselves. Finally, it is not obvious how to interpret the detectors' outputs in the presence of adversarial noise. First, we investigate the effectiveness of attributes for bridging the semantic gap in complicated ML systems. We turn a system that does continuous authentication of human faces on mobile phones into an interactive attribute-based one. We employ deep multi-task learning in conjunction with multi-view classification using facial parts to tackle this problem. We show how the proposed system decomposition enables efficient deployment of deep networks for authentication on mobile phones with limited resources. Next, we seek to improve the attribute detectors by using conditional image synthesis. We take a generative modeling approach for manipulating the semantics of a given image to provide novel examples. Previous works condition the generation process on binary attribute existence values. We take this type of approaches one step further by modeling each attribute as a distributed representation in a vector space. These representations allow us to not only toggle the presence of attributes but to transfer an attribute style from one image to the other. Furthermore, we show diverse image generation from the same set of conditions, which was not possible using existing methods with a single dimension per attribute. We then investigate filling in the semantic gap between humans and attribute classifiers by proposing a new way to explain the pre-trained attribute detectors. We use adversarial training in conjunction with an encoder-decoder model to learn the behavior of binary attribute classifiers. We show that after our proposed model is trained, one can see which areas of the image contribute to the presence/absence of the target attribute, and also how to change image pixels in those areas so that the attribute classifier decision changes in a consistent way with human perception. Finally, we focus on protecting the attribute models from un-interpretable behaviors provoked by adversarial perturbations. These behaviors create an inexplainable semantic gap since they are visually unnoticeable. We propose a method based on generative adversarial networks to alleviate this issue. We learn the training data distribution that is used to train the core classifier and use it to detect and denoise test samples. We show that the method is effective for defending facial attribute detectors.
  • Thumbnail Image
    Item
    Face Recognition from Weakly Labeled Data
    (2016) Chen, Ching-Hui; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Recognizing the identity of a face or a person in the media usually requires lots of training data to design robust classifiers, which demands a great amount of human effort for annotation. Alternatively, the weakly labeled data is publicly available, but the labels can be ambiguous or noisy. For instance, names in the caption of a news photo provide possible candidates for faces appearing in the image. Names in the screenplays are only weakly associated with faces in the videos. Since weakly labeled data is not explicitly labeled by humans, robust learning methods that use weakly labeled data should suppress the impact of noisy instances or automatically resolve the ambiguities in noisy labels. We propose a method for character identification in a TV-series. The proposed method uses automatically extracted labels by associating the faces with names in the transcripts. Such weakly labeled data often has erroneous labels resulting from errors in detecting a face and synchronization. Our approach achieves robustness to noisy labeling by utilizing several features. We construct track nodes from face and person tracks and utilize information from facial and clothing appearances. We discover the video structure for effective inference by constructing a minimum-distance spanning tree (MST) from the track nodes. Hence, track nodes of similar appearance become adjacent to each other and are likely to have the same identity. The non-local cost aggregation step thus serves as a noise suppression step to reliably recognize the identity of the characters in the video. Another type of weakly labeled data results from labeling ambiguities. In other words, a training sample can have more than one label, and typically one of the labels is the true label. For instance, a news photo is usually accompanied by the captions, and the names provided in the captions can be used as the candidate labels for the faces appearing in the photo. Learning an effective subject classifier from the ambiguously labeled data is called ambiguously labeled learning. We propose a matrix completion framework for predicting the actual labels from the ambiguously labeled instances, and a standard supervised classifier that subsequently learns from the disambiguated labels to classify new data. We generalize this matrix completion framework to handle the issue of labeling imbalance that avoids domination by dominant labels. Besides, an iterative candidate elimination step is integrated with the proposed approach to improve the ambiguity resolution. Recently, video-based face recognition techniques have received significant attention since faces in a video provide diverse exemplars for constructing a robust representation of the target (i.e., subject of interest). Nevertheless, the target face in the video is usually annotated with minimum human effort (i.e., a single bounding box in a video frame). Although face tracking techniques can be utilized to associate faces in a single video shot, it is ineffective for associating faces across multiple video shots. To fully utilize faces of a target in multiples-shot videos, we propose a target face association (TFA) method to obtain a set of images of the target face, and these associated images are then utilized to construct a robust representation of the target for improving the performance of video-based face recognition task. One of the most important applications of video-based face recognition is outdoor video surveillance using a camera network. Face recognition in outdoor environment is a challenging task due to illumination changes, pose variations, and occlusions. We present the taxonomy of camera networks and discuss several techniques for continuous tracking of faces acquired by an outdoor camera network as well as a face matching algorithm. Finally, we demonstrate the real-time video surveillance system using pan-tilt-zoom (PTZ) cameras to perform pedestrian tracking, localization, face detection, and face recognition.
  • Thumbnail Image
    Item
    Restoration and Domain Adaptation for Unconstrained Face Recognition
    (2014) Ni, Jie; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Face recognition (FR) has received great attention and tremendous progress has been made during the past two decades. While FR at close range under controlled acquisition conditions has achieved a high level of performance, FR at a distance under unconstrained environment remains a largely unsolved problem. This is because images collected from a distance usually suffer from blur, poor illumination, pose variation etc. In this dissertation, we present models and algorithms to compensate for these variations to improve the performance for FR at a distance. Blur is a common factor contributing to the degradation of images collected from a distance, e.g., defocus blur due to long range acquisition, motion blur due to movement of subjects. For this purpose, we study the image deconvolution problem. This is an ill-posed problem, and solutions are usually obtained by exploiting prior information of desired output image to reduce ambiguity, typically through the Bayesian framework. In this dissertation, we consider the role of an example driven manifold prior to address the deconvolution problem. Specifically, we incorporate unlabeled image data of the object class in the form of a patch manifold to effectively regularize the inverse problem. We propose both parametric and non-parametric approaches to implicitly estimate the manifold prior from the given unlabeled data. Extensive experiments show that our method performs better than many competitive image deconvolution methods. More often, variations from the collected images at a distance are difficult to address through physical models of individual degradations. For this problem, we utilize domain adaptation methods to adapt recognition systems to the test data. Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain. We focus on the unsupervised domain adaptation problem where labeled data are not available in the target domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains. These subspaces are able to capture the intrinsic domain shift and form a shared feature representation for cross domain recognition. Experimental results on publicly available datasets demonstrate the effectiveness of our approach for face recognition across pose, blur and illumination variations, and cross dataset object classification. Most existing domain adaptation methods assume homogeneous source domain which is usually modeled by a single subspace. Yet in practice, oftentimes we are given mixed source data with different inner characteristics. Modeling these source data as a single domain would potentially deteriorate the adaptation performance, as the adaptation procedure needs to account for the large within class variations in the source domain. For this problem, we propose two approaches to mitigate the heterogeneity in source data. We first present an approach for selecting a subset of source samples which is more similar to the target domain to avoid negative knowledge transfer. We then consider the scenario that the heterogenous source data are due to multiple latent domains. For this purpose, we derive a domain clustering framework to recover the latent domains for improved adaptation. Moreover, we formulate submodular objective functions which can be solved by an efficient greedy method. Experimental results show that our approaches compare favorably with the state-of-the-art.
  • Thumbnail Image
    Item
    Face Recognition and Facial Attribute Analysis from Unconstrained Visual Data
    (2014) Ho, Huy Tho; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Analyzing human faces from visual data has been one of the most active research areas in the computer vision community. However, it is a very challenging problem in unconstrained environments due to variations in pose, illumination, expression, occlusion and blur between training and testing images. The task becomes even more difficult when only a limited number of images per subject is available for modeling these variations. In this dissertation, different techniques for performing classification of human faces as well as other facial attributes such as expression, age, gender, and head pose in uncontrolled settings are investigated. In the first part of the dissertation, a method for reconstructing the virtual frontal view from a given non-frontal face image using Markov Random Fields (MRFs) and an efficient variant of the Belief Propagation (BP) algorithm is introduced. In the proposed approach, the input face image is divided into a grid of overlapping patches and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade (LK) algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks as well as no head pose estimation is needed. In the second part, the task of face recognition in unconstrained settings is formulated as a domain adaptation problem. The domain shift is accounted for by deriving a latent subspace or domain, which jointly characterizes the multifactor variations using appropriate image formation models for each factor. The latent domain is defined as a product of Grassmann manifolds based on the underlying geometry of the tensor space, and recognition is performed across domain shift using statistics consistent with the tensor geometry. More specifically, given a face image from the source or target domain, multiple images of that subject are first synthesized under different illuminations, blur conditions, and 2D perturbations to form a tensor representation of the face. The orthogonal matrices obtained from the decomposition of this tensor, where each matrix corresponds to a factor variation, are used to characterize the subject as a point on a product of Grassmann manifolds. For cases with only one image per subject in the source domain, the identity of target domain faces is estimated using the geodesic distance on product manifolds. When multiple images per subject are available, an extension of kernel discriminant analysis is developed using a novel kernel based on the projection metric on product spaces. Furthermore, a probabilistic approach to the problem of classifying image sets on product manifolds is introduced. Understanding attributes such as expression, age class, and gender from face images has many applications in multimedia processing including content personalization, human-computer interaction, and facial identification. To achieve good performance in these tasks, it is important to be able to extract pertinent visual structures from the input data. In the third part of the dissertation, a fully automatic approach for performing classification of facial attributes based on hierarchical feature learning using sparse coding is presented. The proposed approach is generative in the sense that it does not use label information in the process of feature learning. As a result, the same feature representation can be applied for different tasks such as expression, age, and gender classification. Final classification is performed by linear SVM trained with the corresponding labels for each task. The last part of the dissertation presents an automatic algorithm for determining the head pose from a given face image. The face image is divided into a regular grid and represented by dense SIFT descriptors extracted from the grid points. Random Projection (RP) is then applied to reduce the dimension of the concatenated SIFT descriptor vector. Classification and regression using Support Vector Machine (SVM) are combined in order to obtain an accurate estimate of the head pose. The advantage of the proposed approach is that it does not require facial landmarks such as the eye and mouth corners, the nose tip to be extracted from the input face image as in many other methods.
  • Thumbnail Image
    Item
    Statistical/Geometric Techniques for Object Representation and Recognition
    (2009) Biswas, Soma; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Object modeling and recognition are key areas of research in computer vision and graphics with wide range of applications. Though research in these areas is not new, traditionally most of it has focused on analyzing problems under controlled environments. The challenges posed by real life applications demand for more general and robust solutions. The wide variety of objects with large intra-class variability makes the task very challenging. The difficulty in modeling and matching objects also vary depending on the input modality. In addition, the easy availability of sensors and storage have resulted in tremendous increase in the amount of data that needs to be processed which requires efficient algorithms suitable for large-size databases. In this dissertation, we address some of the challenges involved in modeling and matching of objects in realistic scenarios. Object matching in images require accounting for large variability in the appearance due to changes in illumination and view point. Any real world object is characterized by its underlying shape and albedo, which unlike the image intensity are insensitive to changes in illumination conditions. We propose a stochastic filtering framework for estimating object albedo from a single intensity image by formulating the albedo estimation as an image estimation problem. We also show how this albedo estimate can be used for illumination insensitive object matching and for more accurate shape recovery from a single image using standard shape from shading formulation. We start with the simpler problem where the pose of the object is known and only the illumination varies. We then extend the proposed approach to handle unknown pose in addition to illumination variations. We also use the estimated albedo maps for another important application, which is recognizing faces across age progression. Many approaches which address the problem of modeling and recognizing objects from images assume that the underlying objects are of diffused texture. But most real world objects exhibit a combination of diffused and specular properties. We propose an approach for separating the diffused and specular reflectance from a given color image so that the algorithms proposed for objects of diffused texture become applicable to a much wider range of real world objects. Representing and matching the 2D and 3D geometry of objects is also an integral part of object matching with applications in gesture recognition, activity classification, trademark and logo recognition, etc. The challenge in matching 2D/3D shapes lies in accounting for the different rigid and non-rigid deformations, large intra-class variability, noise and outliers. In addition, since shapes are usually represented as a collection of landmark points, the shape matching algorithm also has to deal with the challenges of missing or unknown correspondence across these data points. We propose an efficient shape indexing approach where the different feature vectors representing the shape are mapped to a hash table. For a query shape, we show how the similar shapes in the database can be efficiently retrieved without the need for establishing correspondence making the algorithm extremely fast and scalable. We also propose an approach for matching and registration of 3D point cloud data across unknown or missing correspondence using an implicit surface representation. Finally, we discuss possible future directions of this research.
  • Thumbnail Image
    Item
    Characterization and Classification of Faces across Age Progression
    (2009) Ramanathan, Narayanan; Chellappa, Rama; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Facial aging, a new dimension that has recently been added to the problem of face recognition, poses interesting theoretical and practical challenges to the research community . How do humans perceive age ? What constitutes an age-invariant signature for faces ? How do we model facial growth across different ages ? How does facial aging effects impact recognition performance ? This thesis provides a thorough overview of the problem of facial aging and addresses the aforementioned questions. We propose a craniofacial growth model that characterizes growth related shape variations observed in human faces during formative years (0 - 18 yrs). The craniofacial growth model draws inspiration from the `revised' cardioidal strain transformation model proposed in psychophysics and further, incorporates age-based anthropometric evidences collected on facial growth during formative years. Identifying a set of fiducial features on faces, we characterize facial growth by means of growth parameters estimated on the fiducial features. We illustrate how the growth related transformations observed on facial proportions can be studied by means of linear and non-linear equations in facial growth parameters, which subsequently help in computing the growth parameters. The proposed growth model implicitly accounts for factors such as gender, ethnicity, the individual's age group etc. Predicting one's appearance across ages, performing face verification across ages etc. are some of the intended applications of the model. Next, we propose a two-fold approach towards modeling facial aging in adults. Firstly, we develop a shape transformation model that is formulated as a physically-based parametric muscle model that captures the subtle deformations facial features undergo with age. The model implicitly accounts for the physical properties and geometric orientations of the individual facial muscles. Next, we develop an image gradient based texture transformation function that characterizes facial wrinkles and other skin artifacts often observed during different ages. Facial growth statistics (both in terms of shape and texture) play a crucial role in developing the aforementioned transformation models. From a database that comprises of pairs of age separated face images of many individuals, we extract age-based facial measurements across key fiducial features and further, study textural variations across ages. We present experimental results that illustrate the applications of the proposed facial aging model in tasks such as face verification and facial appearance prediction across aging. How sensitive are face verification systems to facial aging effects ? How does age progression affect the similarity between a pair of face images of an individual ? We develop a Bayesian age difference classifier that classifies face images of individuals based on age differences and performs face verification across age progression. Further, we study the similarity of faces across age progression. Since age separated face images invariably differ in illumination and pose, we propose pre-processing methods for minimizing such variations. Experimental results using a database comprising of pairs of face images that were retrieved from the passports of 465 individuals are presented. The verification system for faces separated by as many as 9 years, attains an equal error rate of 8.5%.
  • Thumbnail Image
    Item
    Recognizing Human Faces: Physical Modeling and Pattern Classification
    (2008-03-03) Aggarwal, Gaurav; Chellappa, Rama; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Although significant work has been done in the field of face recognition, the performance of the state-of-the-art face recognition algorithms is not good enough to be effective in operational systems. Most algorithms work well for controlled images but are quite susceptible to changes in illumination, pose, etc. In this dissertation, we propose methods which address these issues, to recognize faces in more realistic scenarios. The developed approaches show the importance of physical modeling, contextual constraints and pattern classification for this task. For still image-based face recognition, we develop an algorithm to recognize faces illuminated by arbitrarily placed, multiple light sources, given just a single image. Though the problem is ill-posed in its generality, linear approximations to the subspace of Lambertian images in combination with rank constraints on unknown facial shape and albedo are used to make it tractable. In addition, we develop a purely geometric illumination-invariant matching algorithm that makes use of the bilateral symmetry of human faces. In particular, we prove that the set of images of bilaterally symmetric objects can be partitioned into equivalence classes such that it is always possible to distinguish between two objects belonging to different equivalence classes using just one image per object. For recognizing faces in videos, the challenge lies in suitable characterization of faces using the information available in the video. We propose a method that models a face as a linear dynamical system whose appearance changes with pose. Though the proposed method performs very well on the available datasets, it does not explicitly take the 3D structure or illumination conditions into account. To address these issues, we propose an algorithm to perform 3D facial pose tracking in videos. The approach combines the structural advantages of geometric modeling with the statistical advantages of a particle filter based inference to recover the 3D configuration of facial features in each frame of the video. The recovered 3D configuration parameters are further used to recognize faces in videos. From a pattern classification point of view, automatic face recognition presents a very unique challenge due to the presence of just one (or a few) sample(s) per identity. To address this, we develop a cohort-based framework that makes use of the large number of non-match samples present in the database to improve verification and identification performance.