Dense 3D Reconstructions from Sparse Visual Data
Files
Publication or External Link
Date
Authors
Advisor
Citation
Abstract
3D reconstruction, the problem of estimating the complete geometry or appearance of objects from partial observations (e.g., several RGB images, partial shapes, videos), serves as a building block in many vision, graphics, and robotics applications such as 3D scanning, autonomous driving, 3D modeling, augmented reality (AR) and virtual reality (VR). However, it is very challenging for machines to recover 3D geometry from such sparse data due to occlusions, and irregularity and complexity of 3D objects. To solve these, in this dissertation, we explore learning-based 3D reconstruction methods for different 3D object representations on different tasks: 3D reconstructions of static objects and dynamic human body from limited data.
For the 3D reconstructions of static objects, we propose a multi-view representation of 3D shapes, which utilizes a set of multi-view RGB images or depth maps to represent a 3D shape. We first explore the multi-view representation for shape completion tasks and develop deep learning methods to generate dense and high-resolution point clouds from partial observations. Yet one problem with the multi-view representation is the inconsistency among different views. To solve this problem, we propose a multi-view consistency optimization strategy to encourage consistency for shape completion in inference stage. Third, the extension of multi-view representation for dense 3D geometry and texture reconstructions from single RGB images will be presented.
Capturing and rendering realistic human appearances under varying poses and viewpoints is an important goal in computer vision and graphics. In the second part, we will introduce some techniques to create 3D virtual human avatars with limited data (e.g., videos). We propose implicit representations of motion, texture, and geometry for human modeling, and utilize neural rendering techniques for free view synthesis of dynamic articulated human body. Our learned human avatars are photorealistic and fully controllable (pose, shape, viewpoints, etc.), which can be used in free-viewpoint video generation, animation, shape editing, telepresence, and AR/VR.
Our proposed methods can learn end-to-end 3D reconstructions from 2D image or video signals. We hope these learning-based methods will assist in perceiving and reconstructing the 3D world for future AI systems.