Single-View 3D Reconstruction of Animals

Thumbnail Image


Publication or External Link





Humans have a remarkable ability to infer the 3D shape of objects from just a single image. Even for complex and non-rigid objects like people and animals, from just a single picture we can say much about its 3D shape, configuration and even the viewpoint that the photo was taken from. Today, the same cannot be said for computers – the existing solutions are limited, particularly for highly articulated and deformable objects. Hence, the purpose of this thesis is to develop methods for single-view 3D reconstruction of non-rigid objects, specifically for people and animals. Our goal is to recover a full 3D surface model of these objects from a single unconstrained image. The ability to do so, even with some user interaction, will have a profound impact in AR/VR and the entertainment industry. Immediate applications are virtual avatars and pets, virtual clothes fitting, immersive games, as well as applications in biology, neuroscience, ecology, and farming. However, this is a challenging problem because these objects can appear in many different forms.

This thesis begins by providing the first fully automatic solution for recovering a 3D mesh of a human body from a single image. Our solution follows the classical paradigm of bottom-up estimation followed by top-down verification. The key is to solve for the mostly likely 3D model that explains the image observations by using powerful priors. The rest of the thesis explores how to extend a similar approach for other animals. Doing so reveals novel challenges whose common thread is the lack of specialized data. For solving the bottom-up estimation problem well, current methods rely on the availability of human supervision in the form of 2D part annotations. However, these annotations do not exist in the same scale for animals. We deal with this problem by means of data synthesis for the case of fine-grained categories such as bird species. There is also little work that systematically addresses the 3D scanning of animals, which almost all prior works require for learning a deformable 3D model. We propose a solution to learn a 3D deformable model from a set of annotated 2D images with a template 3D mesh and from a few set of 3D toy figurine scans. We show results on birds, house cats, horses, cows, dogs, big cats, and even hippos. This thesis makes steps towards a fully automatic system for single-view 3D reconstruction of animals. We hope this work inspires more future research in this direction.