Representing and Querying Uncertain Data
Publication or External Link
There has been a longstanding interest in building systems that can handle uncertain data. Traditional database systems inherently assume exact data and harbor fundamental limitations when it comes to handling uncertain data. In this dissertation, we present a probabilistic database model that can compactly represent uncertainty models in full generality. Our representation is associated with precise and intuitive semantics and we show that the answer to every user-submitted query can be obtained by performing probabilistic inference. To query large-scale probabilistic databases, we propose a number of techniques that help scale probabilistic inference. Foremost among these techniques is a novel lifted inference algorithm that determines and exploits symmetries in the uncertainty model to speed up query evaluation. For cases when the uncertainty model stored in the database does not contain symmetries, we propose a number of techniques that perform approximate lifted inference. Our techniques for approximate lifted inference have the added advantage of allowing the user to control the degree of approximation through a handful of tunable parameters. Besides scaling probabilistic inference, we also develop techniques that alter the structure of inference required to evaluate a query. More specifically, we show that for a restricted model of our probabilistic database, if each result tuple can be represented by a boolean formula with special characteristics, i.e., it is a read-once function, then the complexity of inference can be drastically reduced. We conclude the dissertation with a listing of directions for future work.