Petri Net Models for Event Recognition in Surveillance Videos

Thumbnail Image


umi-umd-4317.pdf (2.73 MB)
No. of downloads: 3108

Publication or External Link






Video surveillance is the process of monitoring the behavior of people and objects within public places, e.g. airports and traffic intersections, by means of visual aids (cameras) usually for safety and security purposes. As the amount of video data gathered daily by surveillance cameras increases, the need for automatic systems to

detect and recognize suspicious activities performed by people and objects is also increasing.

The first part of the thesis describes a framework for modeling and recognition of events from surveillance video. Our framework is

based on deterministic inference using Petri nets. Events can be composed by combining primitive events and previously defined events

by spatial, temporal and logical relations. We provide a graphical user interface (GUI) to formulate such event models. Our approach

automatically maps each of these models into a set of Petri net filters that represent the components of the event. Lower-level video processing modules, e.g. background subtraction, tracking and classification, are used to detect the occurrence of primitive events. These primitive events are then filtered by Petri nets

filters to recognize composite events of interest. Our framework is general enough and we have applied it to many surveillance domains.

In the second part of the thesis, we address the problem of detecting carried objects. Detecting carried objects is the main step to solve the problem of left object detection. We present two

approaches to the left object detection problem. Both approaches poses the problem as a classification problem. For both approaches,

we trained SVM classifiers on a laboratory database that contains examples of people seen with and without two common objects, namely backpacks and suitcases. We used a boosting

technique, AdaBoost, to select the most

discriminative features used by the SVMs and to enhance the performance of the classifiers. We give recognition results for each approach and then compare both approaches and describe the

advantages of each one. We also compare the performance of both approaches on real world videos captured at the Munich airport.