Affective Human Motion Detection and Synthesis

Bhattacharya, Uttaran

Affective Human Motion Detection and Synthesis

dc.contributor.advisor	Manocha, Dinesh	en_US
dc.contributor.author	Bhattacharya, Uttaran	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2023-02-01T06:36:46Z
dc.date.available	2023-02-01T06:36:46Z
dc.date.issued	2022	en_US
dc.description.abstract	Human emotion perception is an integral component of intelligent systems currently being designed for a wide range of socio-cultural applications, including behavior prediction, social robotics, medical therapy and rehabilitation, surveillance, and animation of digital characters in multimedia. Human observers perceive emotions from a number of cues or modalities, including faces, speech, and body expressions. Studies in affective computing indicate that emotions perceived from body expressions are extremely consistent across observers because humans tend to have less conscious control over their body expressions. Our research focuses on this aspect of emotion perception as we attempt to build predictive methods for automated emotion recognition from body expressions, and build generative methods for synthesizing digital characters with appropriate affective body expressions. This thesis elaborates on both components of our research in two parts, and explores how they can be applied to current problems in video understanding, specifically video highlight detection. The first part discusses two approaches for designing and training partially supervised methods for emotion recognition from body expressions, specifically gaits. In one approach, we leverage existing gait datasets annotated with emotions to generate large-scale synthetic gaits corresponding to the emotion labels. In the other approach, we leverage large-scale unlabeled gait datasets together with smaller annotated gait datasets to learn meaningful latent representations for emotion recognition. We design an autoencoder coupled with a classifier to learn latent representations for simultaneously reconstructing all input gaits and classifying the labeled gaits into emotion classes. The second part discusses generative methods to synthesize emotionally expressive bodily expressions, specifically gaits, gestures, and faces. The first method involves asynchronous generation, where we synthesize only one modality of the digital characters (in our case, gaits) with affective expressions. Our approach is to design an autoregression network that takes in a history of the characters' pose sequences and the intended future emotions to generate their future pose sequences with the desired affective expressions. The second method is the more challenging synchronous generation, where the affective contents of two modalities, such as body gestures and speech, need to be synchronized with each other. Our approach utilizes machine translation to translate from speech to body gestures, and adversarial discrimination to differentiate between original and synthesized gestures in terms of affective expressions, to produce state-of-the-art affective body gestures synchronized with speech. The final method takes synchronous generation a step further to three modalities, involving the synthesis of both facial expressions and body gestures synchronized with speech. This method attempts to break new ground in multimodal synthesis by simultaneously incorporating emotional expressions in more than one modality, and does so using data from affordable, consumer-grade devices such as RGB video cameras to enable democratized usage. Lastly, we explore the application of these approaches to industrial problems in video understanding, specifically video highlight detection. Our approach leads to state-of-the-art performance in detecting highlights in human-centric videos without requiring supervision in the form of highlight annotations. Our approach can be further fine-tuned to detect user-specific highlights at scale by automatically learning the video contents matching the users' preferences in their previously selected highlight clips.	en_US
dc.identifier	https://doi.org/10.13016/3crk-mosg
dc.identifier.uri	http://hdl.handle.net/1903/29575
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pquncontrolled	Affective Computing	en_US
dc.subject.pquncontrolled	Emotion Detection	en_US
dc.subject.pquncontrolled	Human Motion Synthesis	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Neural Networks	en_US
dc.subject.pquncontrolled	Synchronous Multimodal Expressions	en_US
dc.title	Affective Human Motion Detection and Synthesis	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Bhattacharya_umd_0117E_22923.pdf
Size:: 108.04 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations