Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Leveraging Structure in Activity Recognition: Context and Spatiotemporal Dynamics

    Thumbnail
    View/Open
    Khamis_umd_0117E_16005.pdf (3.521Mb)
    No. of downloads: 407

    VideoSamples.zip (1.953Mb)
    No. of downloads: 75

    Date
    2015
    Author
    Khamis, Sameh
    Advisor
    Davis, Larry S
    DRUM DOI
    https://doi.org/10.13016/M2VP7Z
    Metadata
    Show full item record
    Abstract
    Activity recognition is one of the fundamental problems of computer vision. An activity recognition system aims to identify the actions of humans from an image or a video. This problem has been historically approached in isolation, and typically as part of a multi-stage system, where tracking for instance is another part. However, recent work sheds light on how activity recognition is in fact entangled with other fundamental problems in the field. Tracking is one such instance, where the identity of each person is maintained across a video sequence. Scene classification is another example, where scene properties are identified from image data. Affordance reasoning is yet another, where the objects in the scene are assigned labels representing what types of actions can be performed upon them. In this thesis we build a joint formulation for activity recognition, modeling the aforementioned coupled problems as latent variables. Optimizing the objective function for this formulation allows us to recover a more accurate solution to activity recognition and simultaneously solutions to problems like tracking or scene classification. We first introduce a model that jointly solves tracking and activity recognition from videos. Instead of establishing tracks in a preprocessing step, the model solves a joint optimization problem, recovering actions and identities for every person in a video sequence. We then extend this model to include frame-level cues, where activity labels assigned to people in the same scene are inter-compatible through a scene-level label. In the second half of the thesis we look at an alternative formulation of the same problem, based on probabilistic logic. This new model leverages the same cues, temporal and spatial, through soft logic rules. This joint formulation can be efficiently solved, recovering both action labels and tracks. We finally introduce another model that reformulates action recognition in the multi-label setting, where each person can be performing more than one action at the same time. In this setting, a joint formulation can solve for all the likely actions of a person through explicit modeling of action label correlations. Finally, we conclude with a discussion of several challenges and how they can motivate viable future extensions.
    URI
    http://hdl.handle.net/1903/16512
    Collections
    • Computer Science Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility