Activity Detection in Untrimmed Videos

Thumbnail Image

Publication or External Link





In this dissertation, we present solutions to the problem of activity detection in untrimmed videos, where we are interested in identifying both when and where various activity instances occur within an unconstrained video. Advances in machine learning, particularly the widespread adoption of deep learning-based methods have yielded robust solutions to a number of historically difficult computer vision application domains. For example, recent systems for object recognition and detection, facial identification, and a number of language processing applications have found widespread commercial success. In some cases, such systems have been able to outperform humans. The same cannot be said for the problem of activity detection in untrimmed videos. This dissertation describes our investigation and innovative solutions for the challenging problem of real-time activity detection in untrimmed videos. The main contributions of our work are the introduction of multiple novel activity detection systems that make strides toward the goal of commercially viable activity detection. The first work introduces a proposal mechanism based on divisive hierarchical clustering of objects to produce cuboid activity proposals, followed by a classification and temporal refinement step. The second work proposes a chunk-based processing mechanism and explores the tradeoff between tube and cuboid proposals. The third work explores the topic of real-time activity detection and introduces strategies for achieving this performance. The final work provides a detailed look into multiple novel extensions that improve upon the state-of-the-art in the field.