Towards markerless motion capture: model estimation, initialization and tracking

Thumbnail Image
umi-umd-4679.pdf(31.59 MB)
No. of downloads: 689
Publication or External Link
Sundaresan, Aravind
Chellappa, Ramalingam
Motion capture is an important application in diverse areas such as bio-mechanics, computer animation, and human-computer interaction. Current motion capture methods use markers that are attached to the body of the subject and are therefore intrusive. In applications such as pathological human movement analysis, these markers may introduce unknown artifacts in the motion and are, in general, cumbersome. We present a computer vision based system for <em>markerless</em> human motion capture that uses images obtained from multiple synchronized and calibrated cameras. We model the human body as a set of rigid segments connected in articulated chains. We use a volumetric representation (voxels) of the subject using images obtained from the cameras in our work. We propose a novel, bottom-up approach to segment the voxels into different articulated chains based on their mutual connectivity, by mapping the voxels into Laplacian Eigenspace. We prove properties of the mapping that show that it is ideal for mapping voxels on non-rigid chains in normal space to nodes that lie on smooth 1D curves in Laplacian Eigenspace. We then use a 1D spline fitting procedure to segment the nodes according to which 1D curve they belong to. The segmentation is followed by a top-down approach that uses our knowledge of the structure of the human body to register the segmented voxels to different articulated chains such as the head, trunk and limbs. We propose a hierarchical algorithm to simultaneously initialize and estimate the pose and body model parameters for the subject. Finally, we propose a tracking algorithm that uses the estimated human body model and the initialized pose for a single frame of a given sequence to track the pose for the remainder of the frames. The tracker uses an iterative algorithm to estimate the pose, that combines both motion and shape cues in a predictor-corrector framework. The motion and shape cues complement each other and overcome drift and local minima problems. We provide results on 3D laser scans, synthetic data, and real video sequences with different subjects for our segmentation, model and pose estimation algorithms.