Multimodal Learning and Its Application to Mobile Active Authentication

Zhang, Heng

Multimodal Learning and Its Application to Mobile Active Authentication

Files

Zhang_umd_0117E_18142.pdf (8.69 MB)

No. of downloads: 315

Date

2017

Authors

Zhang, Heng

Advisor

Chellappa, Rama

DRUM DOI

https://doi.org/10.13016/M2ST7DX56

Abstract

Mobile devices are becoming increasingly popular due to their flexibility and convenience in managing personal information such as bank accounts, profiles and passwords. With the increasing use of mobile devices comes the issue of security as the loss of a smartphone would compromise the personal information of the user.

Traditional methods for authenticating users on mobile devices are based on passwords or fingerprints. As long as mobile devices remain active, they do not incorporate any mechanisms for verifying if the user originally authenticated is still the user in control of the mobile device. Thus, unauthorized individuals may improperly obtain access to personal information of the user if a password is compromised or if a user does not exercise adequate vigilance after initial authentication on a device. To deal with this problem, active authentication systems have been proposed in which users are continuously monitored after the initial access to the mobile device. Active authentication systems can capture users' data (facial image data, screen touch data, motion data, etc) through sensors (camera, touch screen, accelerometer, etc), extract features from different sensors' data, build classification models and authenticate users via comparing additional sensor data against the models.

Mobile active authentication can be viewed as one application of the more general problem, namely, multimodal classification. The idea of multimodal classification is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance compared to using a single source (modality). Multimodal classification also arises in many computer vision tasks such as image classification, RGBD object classification and scene recognition.

In this dissertation, we not only present methods and algorithms related to active authentication problems, but also propose multimodal recognition algorithms based on low-rank and joint sparse representations as well as multimodal metric learning algorithm to improve multimodal classification performance. The multimodal learning algorithms proposed in this dissertation make no assumption about the feature type or applications, thus they can be applied to various recognition tasks such as mobile active authentication, image classification and RGBD recognition.

First, we study the mobile active authentication problem by exploiting a dataset consisting of 50 users' face captured by the phone's frontal camera and screen touch data sensed by the screen for evaluating active authentication algorithms developed under this research. The dataset is named as UMD Active Authentication (UMDAA) dataset. Details on data preprocessing and feature extraction for touch data and face data are described respectively.

Second, we present an approach for active user authentication using screen touch gestures by building linear and kernelized dictionaries based on sparse representations and associated classifiers. Experiments using the screen touch data components of UMDAA dataset as well as two other publicly available screen touch datasets show that the dictionary-based classification method compares favorably to those discussed in the literature. Experiments done using screen touch data collected in three different sessions show a drop in performance when the training and test data come from different sessions. This suggests a need for applying domain adaptation methods to further improve the performance of the classifiers.

Third, we propose a domain adaptive sparse representation-based classification method that learns projections of data in a space where the sparsity of data is maintained. We provide an efficient iterative procedure for solving the proposed optimization problem. One of the key features of the proposed method is that it is computationally efficient as learning is done in the lower-dimensional space. Various experiments on UMDAA dataset show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive domain adaptation algorithms.

Fourth, we propose low-rank and joint sparse representations-based multimodal recognition. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse regression, where sparse and low-rank representations across all the modalities are imposed. One of our methods takes into account coupling information within different modalities simultaneously by enforcing the common low-rank and joint sparse representation among each modality's observations. We also modify our formulations by including an occlusion term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve the proposed optimization problems. Extensive experiments on UMDAA dataset, WVU multimodal biometrics dataset and Pascal-Sentence image classification dataset show that that our methods provide better recognition performance than other feature-level fusion methods.

Finally, we propose a hierarchical multimodal metric learning algorithm for multimodal data in order to improve multimodal classification performance. We design metric for each modality as a product of two matrices: one matrix is modality specific, the other is enforced to be shared by all the modalities. The modality specific projection matrices capture the varying characteristics exhibited by multiple modalities and the common projection matrix establishes the relationship of the distance metrics corresponding to multiple modalities. The learned metrics significantly improves classification accuracy and experimental results of tagged image classification problem as well as various RGBD recognition problems show that the proposed algorithm outperforms existing learning algorithms based on multiple metrics as well as other state-of-the-art approaches tested on these datasets. Furthermore, we make the proposed multimodal metric learning algorithm non-linear by using kernel methods.

URI (handle)

http://hdl.handle.net/1903/19782

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations

Full item page