LEARNING OF DENSE OPTICAL FLOW, MOTION AND DEPTH, FROM SPARSE EVENT CAMERAS

dc.contributor.advisorAloimonos, Yiannisen_US
dc.contributor.advisorFermüller, Corneliaen_US
dc.contributor.authorYe, Chengxien_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2019-09-27T05:39:43Z
dc.date.available2019-09-27T05:39:43Z
dc.date.issued2019en_US
dc.description.abstractWith recent advances in the field of autonomous driving, autonomous agents need to safely navigate around humans or other moving objects in unconstrained, highly dynamic environments. In this thesis, we demonstrate the feasibility of reconstructing dense depth, optical flow and motion information from a neuromorphic imaging device, called Dynamic Vision Sensor (DVS). The DVS only records sparse and asynchronous events when the changes of lighting occur at camera pixels. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only. To tackle this problem of reconstructing dense information from sparse information, we introduce the Evenly-Cascaded convolutional Network (ECN), a bio-inspired multi-level, multi-resolution neural network architecture. The network features an evenly-shaped design, and utilization of both high and low level features. With just 150k parameters, our self-supervised pipeline is able to surpass pipelines that are 100x larger. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation in wild outdoor scenes. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for realtime robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night. We also extend our pipeline to dynamic indoor scenes with independent moving objects. In addition to camera egomotion and a dense depth map, the network utilizes a mixture model to segment and compute per-object 3D translational velocities for moving objects. For this indoor task we are able to train a shallow network with just 40k parameters, which computes qualitative depth and egomotion. Our analysis of the training shows modern neural networks are trained on tangled signals. This tangling effect can be imagined as a blurring introduced both by nature and by the training process. We propose to untangle the data with network deconvolution. We notice significantly better convergence without using any standard normalization techniques, which suggests us deconvolution is what we need.en_US
dc.identifierhttps://doi.org/10.13016/fhqf-g7xr
dc.identifier.urihttp://hdl.handle.net/1903/25034
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledArtificial Neural Networksen_US
dc.subject.pquncontrolledDeconvolutionen_US
dc.subject.pquncontrolledDeep Learningen_US
dc.subject.pquncontrolledSparse Event Cameraen_US
dc.subject.pquncontrolledStructure from Motionen_US
dc.titleLEARNING OF DENSE OPTICAL FLOW, MOTION AND DEPTH, FROM SPARSE EVENT CAMERASen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ye_umd_0117E_20245.pdf
Size:
12.65 MB
Format:
Adobe Portable Document Format