Leveraging Structure in Activity Recognition: Context and Spatiotemporal Dynamics

Khamis, Sameh

Leveraging Structure in Activity Recognition: Context and Spatiotemporal Dynamics

dc.contributor.advisor	Davis, Larry S	en_US
dc.contributor.author	Khamis, Sameh	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2015-06-25T05:50:08Z
dc.date.available	2015-06-25T05:50:08Z
dc.date.issued	2015	en_US
dc.description.abstract	Activity recognition is one of the fundamental problems of computer vision. An activity recognition system aims to identify the actions of humans from an image or a video. This problem has been historically approached in isolation, and typically as part of a multi-stage system, where tracking for instance is another part. However, recent work sheds light on how activity recognition is in fact entangled with other fundamental problems in the field. Tracking is one such instance, where the identity of each person is maintained across a video sequence. Scene classification is another example, where scene properties are identified from image data. Affordance reasoning is yet another, where the objects in the scene are assigned labels representing what types of actions can be performed upon them. In this thesis we build a joint formulation for activity recognition, modeling the aforementioned coupled problems as latent variables. Optimizing the objective function for this formulation allows us to recover a more accurate solution to activity recognition and simultaneously solutions to problems like tracking or scene classification. We first introduce a model that jointly solves tracking and activity recognition from videos. Instead of establishing tracks in a preprocessing step, the model solves a joint optimization problem, recovering actions and identities for every person in a video sequence. We then extend this model to include frame-level cues, where activity labels assigned to people in the same scene are inter-compatible through a scene-level label. In the second half of the thesis we look at an alternative formulation of the same problem, based on probabilistic logic. This new model leverages the same cues, temporal and spatial, through soft logic rules. This joint formulation can be efficiently solved, recovering both action labels and tracks. We finally introduce another model that reformulates action recognition in the multi-label setting, where each person can be performing more than one action at the same time. In this setting, a joint formulation can solve for all the likely actions of a person through explicit modeling of action label correlations. Finally, we conclude with a discussion of several challenges and how they can motivate viable future extensions.	en_US
dc.identifier	https://doi.org/10.13016/M2VP7Z
dc.identifier.uri	http://hdl.handle.net/1903/16512
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pqcontrolled	Computer engineering	en_US
dc.subject.pquncontrolled	Activity Recognition	en_US
dc.subject.pquncontrolled	Computer Vision	en_US
dc.subject.pquncontrolled	Machine Learning	en_US
dc.subject.pquncontrolled	Optimization	en_US
dc.subject.pquncontrolled	Scene Understanding	en_US
dc.title	Leveraging Structure in Activity Recognition: Context and Spatiotemporal Dynamics	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Khamis_umd_0117E_16005.pdf
Size:: 3.52 MB
Format:: Adobe Portable Document Format

Download

Name:: VideoSamples.zip
Size:: 1.95 MB
Format:: Unknown data format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations