Representing and Querying Correlated Tuples in Probabilistic Databases
Representing and Querying Correlated Tuples in Probabilistic Databases
Files
Publication or External Link
Date
2006-08-07
Authors
Sen, Prithviraj
Deshpande, Amol
Advisor
Citation
DRUM DOI
Abstract
Probabilistic databases have received considerable attention recently due
to the need for storing uncertain data produced by many real world
applications. The widespread use of probabilistic databases is hampered by
two limitations: (1) current probabilistic databases make simplistic
assumptions about the data (e.g., complete independence among tuples) that
make it difficult to use them in applications that naturally produce
correlated data, and (2) most probabilistic databases can only answer a
restricted subset of the queries that can be expressed using traditional
query languages. We address both these limitations by proposing a
framework that can represent not only probabilistic tuples, but also
correlations that may be present among them. Our proposed framework
naturally lends itself to the possible world semantics thus preserving the
precise query semantics extant in current probabilistic databases. We
develop an efficient strategy for query evaluation over such probabilistic
databases by casting the query processing problem as an inference problem
in an appropriately constructed probabilistic graphical model. We present
several optimizations specific to probabilistic databases that enable
efficient query evaluation. We validate our approach by presenting an
experimental evaluation that illustrates the effectiveness of our
techniques at answering various queries using real and synthetic datasets.