Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

Thumbnail Image


CS-TR-4867.pdf (417.49 KB)
No. of downloads: 5480

Publication or External Link







In this paper, we address the problem of extending a relational database system to facilitate efficient real-time application of dynamic probabilistic models to streaming data. We use the recently proposed abstraction of model-based views for this purpose, by allowing users to declaratively specify the model to be applied, and by presenting the output of the models to the user as a probabilistic database view. We support declarative querying over such views using an extended version of SQL that allows for querying probabilistic data. Underneath we use particle filters, a class of sequential Monte Carlo algorithms commonly used to implement dynamic probabilistic models, to represent the present and historical states of the model as sets of weighted samples (particles) that are kept up-to-date as new readings arrive. We develop novel techniques to convert the queries on the model-based view directly into queries over particle tables, enabling highly efficient query processing. Finally, we present experimental evaluation of our prototype implementation over sensor data from the Intel Lab dataset that demonstrates the feasibility of online modeling of streaming data using our system and establishes the advantages of such tight integration between dynamic probabilistic models and database systems.