Domain Transfer Learning for Object and Action Recognition

dc.contributor.advisorChellappa, Ramaen_US
dc.contributor.authorZheng, Jingjingen_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2016-02-09T06:32:27Z
dc.date.available2016-02-09T06:32:27Z
dc.date.issued2015en_US
dc.description.abstractVisual recognition has always been a fundamental problem in computer vision. Its task is to learn visual categories using labeled training data and then identify unlabeled new instances of those categories. However, due to the large variations in visual data, visual recognition is still a challenging problem. Handling the variations in captured images is important for real-world applications where unconstrained data acquisition scenarios are widely prevalent. In this dissertation, we first address the variations between training and testing data. Particularly, for cross-domain object recognition, we propose a Grassmann manifold-based domain adaptation approach to model the domain shift using the geodesic connecting the source and target domains. We further measure the distance between two data points from different domains by integrating the distance of their projections through all the intermediate subspaces along the geodesic. Our proposed approach that exploits all the intermediate subspaces along the geodesic produces a more accurate metric. For cross-view action recognition, we present two effective approaches to learn transferable dictionaries and view-invariant sparse representations. In the first approach, we learn a set of transferable dictionaries where each dictionary corresponds to one camera view. The set of dictionaries is learned simultaneously from sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we relaxes this constraint by encouraging correspondence videos to have similar sparse representations. In addition, we learn a common dictionary that is incoherent to view-specific dictionaries for cross-view action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. In this way, we can align view-specific features in the sparse feature spaces spanned by the view-specific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. In order to handle the more general variations in captured images, we also exploit the semantic information to learn discriminative feature representations for visual recognition. Class labels are often organized in a hierarchical taxonomy based on their semantic meanings. We propose a novel multi-layer hierarchical dictionary learning framework for region tagging. Specifically, we learn a node-specific dictionary for each semantic label in the taxonomy and preserve the hierarchial semantic structure in the relationship among these node-dictionaries. Our approach can also transfer knowledge from semantic label at higher levels to help learn the classifiers for semantic labels at lower levels. Moreover, we exploit the semantic attributes for boosting the performance of visual recognition. We encode objects or actions based on attributes that describe them as high-level concepts. We consider two types of attributes. One type of attributes is generated by humans, while the second type is data-driven attributes extracted from data using dictionary learning methods. Attribute-based representation may exhibit variations due to noisy and redundant attributes. We propose a discriminative and compact attribute-based representation by selecting a subset of discriminative attributes from a large attribute set. Three attribute selection criteria are proposed and formulated as a submodular optimization problem. A greedy optimization algorithm is presented and its solution is guaranteed to be at least (1-1/e)-approximation to the optimum.en_US
dc.identifierhttps://doi.org/10.13016/M2MH8F
dc.identifier.urihttp://hdl.handle.net/1903/17349
dc.language.isoenen_US
dc.subject.pqcontrolledElectrical engineeringen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledAction recognitionen_US
dc.subject.pquncontrolledDictionary learningen_US
dc.subject.pquncontrolledDomain transferen_US
dc.subject.pquncontrolledObject recongitionen_US
dc.titleDomain Transfer Learning for Object and Action Recognitionen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zheng_umd_0117E_16726.pdf
Size:
6.22 MB
Format:
Adobe Portable Document Format