View-Invariance in Visual Human Motion Analysis
View-Invariance in Visual Human Motion Analysis
Files
Publication or External Link
Date
2004-04-29
Authors
Parameswaran, Vasudev
Advisor
Chellappa, Rama
Citation
DRUM DOI
Abstract
This thesis makes contributions towards the solutions to
two problems in the area of visual human motion
analysis: human action recognition and human body pose
estimation. Although there has been a substantial
amount of research addressing these two problems in the
past, the important issue of viewpoint invariance in
the representation and recognition of poses and actions
has received relatively scarce attention, and forms a
key goal of this thesis.
Drawing on results from 2D projective invariance theory
and 3D mutual invariants, we present three different
approaches of varying degrees of generality, for human
action representation and recognition. A detailed
analysis of the approaches reveals key challenges,
which are circumvented by enforcing spatial and
temporal coherency constraints. An extensive
performance evaluation of the approaches on 2D
projections of motion capture data and manually
segmented real image sequences demonstrates that in
addition to viewpoint changes, the approaches are able
to handle well, varying speeds of execution of actions
(and hence different frame rates of the video),
different subjects and minor variabilities in the
spatiotemporal dynamics of the action.
Next, we present a method for recovering the
body-centric coordinates of key joints and parts of a
canonically scaled human body, given an image of the
body and the point correspondences of specific body
joints in an image. This problem is difficult to solve
because of body articulation and perspective effects.
To make the problem tractable, previous researchers
have resorted to restricting the camera model or
requiring an unrealistic number of point
correspondences, both of which are more restrictive
than necessary. We present a solution for the general
case of a perspective uncalibrated camera. Our method
requires that the torso does not twist considerably, an
assumption that is usually satisfied for many poses of
the body. We evaluate the quantitative performance of
the method on synthetic data and the qualitative
performance of the method on real images taken with
unknown cameras and viewpoints. Both these evaluations
show the effectiveness of the method at recovering the
pose of the human body.