Motion Segmentation and Egomotion Estimation with Event-Based Cameras

Thumbnail Image
Publication or External Link
Mitrokhin, Anton
Aloimonos, Yiannis
Computer vision has been dominated by classical, CMOS frame-based imaging sensors for many years. Yet, motion is not well represented in classical cameras and vision techniques - a consequence of traditional vision being frame-based and only existing 'in the moment' while motion is a continuous entity. With the introduction of neuromorphic hardware, such as the event-based cameras, we are ready to cross the bridge of frame based vision and develop a new concept - motion-based vision. The event-based sensor provides dense temporal information about changes on the scene - it can ‘see’ the motion at an equivalent of almost infinite framerate, making a perfect fit for creating dense, long term motion trajectories and allowing for a significantly more efficient, generic and at the same time accurate motion perception. By its design, an event-based sensor accommodates a large dynamic range, provides high temporal resolution and low latency -- ideal properties for applications where high quality motion estimation and tolerance towards challenging lighting conditions are desirable. The price for these properties is indeed heavy - event-based sensors produce a lot of noise, their resolution is relatively low and their data - typically referred to as event cloud - is asynchronous and sparse. Event sensors offer new opportunities for robust visual perception so much needed in autonomous robotics, but challenges associated with the sensor output, such as high noise, relatively low spatial resolution and sparsity, ask for different visual processing approaches. In this dissertation we develop methods and frameworks for motion segmentation and egomotion estimation on event-based data, starting with a simple optimization-based approach for camera motion compensation and object tracking and continuing by developing several deep learning pipelines, while continuing to explore the connection between the shapes of the event clouds and scene motion. We collect EV-IMO - the first pixelwise-annotated dataset for motion segmentation for event cameras and propose a 3D graph-based learning approach for motion segmentation in (x, y, t) domain. Finally we develop a set of mathematical constraints for event streams which leverage their temporal density and connect the shape of the event cloud with camera and object motion.