SCENE AND ACTION UNDERSTANDING USING CONTEXT AND KNOWLEDGE SHARING

GHOSH, PALLABI

SCENE AND ACTION UNDERSTANDING USING CONTEXT AND KNOWLEDGE SHARING

dc.contributor.advisor	Davis, Larry S	en_US
dc.contributor.advisor	Shrivastava, Abhinav	en_US
dc.contributor.author	GHOSH, PALLABI	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2021-02-14T06:36:48Z
dc.date.available	2021-02-14T06:36:48Z
dc.date.issued	2020	en_US
dc.description.abstract	Complete scene understanding from video data involves spatio-temporal decision making over long sequences and utilization of world knowledge. We propose a method that captures edge connections between these spatio-temporal components or knowledge graphs through a graph convolution network (GCN). Our approach uses the GCN to fuse various information in the video like detected objects, human pose, scene information etc. for action segmentation. For certain functions like zero shot and few shot action recognition, we learn a classifier for unseen test classes through comparison with similar training classes. We provide information about similarity between two classes through an explicit relationship map i.e. the knowledge graph. We study different kinds of knowledge graphs based on action phrases, verbs or nouns and visual features to demonstrate how they perform with respect to each other. We build an integrated approach for zero-shot and few-shot learning. We also show further improvements through adaptive learning of the input knowledge graphs and using triplet loss along with the task specific loss while training. We add results for semi-supervised learning as well to understand improvements from our graph learning technique. For complete scene understanding, we also study depth completion using deep depth prior based on the deep image prior (DIP) technique. DIP shows that structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. Given color images and noisy or incomplete target depth maps, we optimize a randomly-initialized CNN model to reconstruct a depth map restored by virtue of using the CNN network structure as a prior combined with a view-constrained photo-consistency loss. This loss is computed using images from a geometrically calibrated camera from nearby viewpoints. It is based on test time optimization, so it is independent of training data distributions. We apply this deep depth prior for inpainting and refining incomplete and noisy depth maps within both binocular and multi-view stereo pipelines.	en_US
dc.identifier	https://doi.org/10.13016/zuhd-qgyl
dc.identifier.uri	http://hdl.handle.net/1903/26825
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.title	SCENE AND ACTION UNDERSTANDING USING CONTEXT AND KNOWLEDGE SHARING	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GHOSH_umd_0117E_21259.pdf
Size:: 14.4 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations