LEARNING FROM MULTIPLE VIEWS OF DATA

dc.contributor.advisorJacobs, David Willianen_US
dc.contributor.authorSharma, Abhisheken_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2015-06-26T05:38:56Z
dc.date.available2015-06-26T05:38:56Z
dc.date.issued2015en_US
dc.description.abstractThis dissertation takes inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. It explores the hypothesis that the human brain can extract and store information from raw data in a form, termed a common representation, suitable for cross-modal content matching. A human-level performance for the aforementioned task requires - a) the ability to extract sufficient information from raw data and b) algorithms to obtain a task-specific common representation from multiple sources of extracted information. This dissertation addresses the aforementioned requirements and develops novel content extraction and cross-modal content matching architectures. The first part of the dissertation proposes a learning-based visual information extraction approach: Recursive Context Propagation Network or RCPN, for semantic segmentation of images. It is a deep neural network that utilizes the contextual information from the entire image for semantic segmentation, through bottom-up followed by top-down context propagation. This improves the feature representation of every super-pixel in an image for better classification into semantic categories. RCPN is analyzed to discover that the presence of bypass-error paths in RCPN can hinder effective context propagation. It is shown that bypass-errors can be tackled by inclusion of classification loss of internal nodes as well. Secondly, a novel tree-MRF structure is developed using the parse trees to model the hierarchical dependency present in the output. The second part of this dissertation develops algorithms to obtain and match the common representations across different modalities. A novel Partial Least Square (PLS) based framework is proposed to learn a common subspace from multiple modalities of data. It is used for multi-modal face biometric problems such as pose-invariant face recognition and sketch-face recognition. The issue of sensitivity to the noise in pose variation is analyzed and a two-stage discriminative model is developed to tackle it. A generalized framework is proposed to extend various popular feature extraction techniques that can be solved as a generalized eigenvalue problem to their multi-modal counterpart. It is termed Generalized Multiview Analysis or GMA, and used for pose-and-lighting invariant face recognition and text-image retrieval.en_US
dc.identifierhttps://doi.org/10.13016/M2R90K
dc.identifier.urihttp://hdl.handle.net/1903/16629
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledComputer Visionen_US
dc.subject.pquncontrolledDeep Neural Networksen_US
dc.subject.pquncontrolledFace Recognitionen_US
dc.subject.pquncontrolledFace Sketch Matchingen_US
dc.subject.pquncontrolledMultimodal Learningen_US
dc.subject.pquncontrolledSemantic Segmentationen_US
dc.titleLEARNING FROM MULTIPLE VIEWS OF DATAen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sharma_umd_0117E_16103.pdf
Size:
8.32 MB
Format:
Adobe Portable Document Format