Learning Visual Classifiers From Limited Labeled Images
Pillai, Jaishanker K.
MetadataПоказать полную информацию
Recognizing humans and their activities from images and video is one of the key goals of computer vision. While supervised learning algorithms like Support Vector Machines and Boosting have offered robust solutions, they require large amount of labeled data for good performance. It is often difficult to acquire large labeled datasets due to the significant human effort involved in data annotation. However, it is considerably easier to collect unlabeled data due to the availability of inexpensive cameras and large public databases like Flickr and YouTube. In this dissertation, we develop efficient machine learning techniques for visual classification from small amount of labeled training data by utilizing the structure in the testing data, labeled data in a different domain and unlabeled data. This dissertation has three main parts. In the first part of the dissertation, we consider how multiple noisy samples available during testing can be utilized to perform accurate visual classification. Such multiple samples are easily available in video-based recognition problem, which is commonly encountered in visual surveillance. Specifically, we study the problem of unconstrained human recognition from iris images. We develop a Sparse Representation-based selection and recognition scheme, which learns the underlying structure of clean images. This learned structure is utilized to develop a quality measure, and a quality-based fusion scheme is proposed to combine the varying evidence. Furthermore, we extend the method to incorporate privacy, an important requirement inpractical biometric applications, without significantly affecting the recognition performance. In the second part, we analyze the problem of utilizing labeled data in a different domain to aid visual classification. We consider the problem of shifts in acquisition conditions during training and testing, which is very common in iris biometrics. In particular, we study the sensor mismatch problem, where the training samples are acquired using a sensor much older than the one used for testing. We provide one of the first solutions to this problem, a kernel learning framework to adapt iris data collected from one sensor to another. Extensive evaluations on iris data from multiple sensors demonstrate that the proposed method leads to considerable improvement in cross sensor recognition accuracy. Furthermore, since the proposed technique requires minimal changes to the iris recognition pipeline, it can easily be incorporated into existing iris recognition systems. In the last part of the dissertation, we analyze how unlabeled data available during training can assist visual classification applications. Here, we consider still image-based vision applications involving humans, where explicit motion cues are not available. A human pose often conveys not only the configuration of the body parts, but also implicit predictive information about the ensuing motion. We propose a probabilistic framework to infer this dynamic information associated with a human pose, using unlabeled and unsegmented videos available during training. The inference problem is posed as a non-parametric density estimation problem on non-Euclidean manifolds. Since direct modeling is intractable, we develop a data driven approach, estimating the density for the test sample under consideration. Statistical inference on the estimated density provides us with quantities of interest like the most probable future motion of the human and the amount of motion information