XJoin: Getting Fast Answers From Slow and Bursty Networks

Thumbnail Image


CS-TR-3994.ps (388.97 KB)
No. of downloads: 311
CS-TR-3994.pdf (271.19 KB)
No. of downloads: 1128

Publication or External Link







The combination of increasingly ubiquitous Internet connectivity and advances in heterogeneous and semi-structured databases has the potential to enable database-style querying over data from sources distributed around the world. Traditional query processing techniques, however, fail to deliver acceptable performance in such a scenario for two main reasons: First, they optimize for delivery of the entire query result, while on-line users would typically benefit from receiving initial results as quickly as possible. Second, slow or bursty delivery of data from remote sources can stall query execution, making the already inadequate batch-like behavior even worse. Both of these problems can be addressed using fully pipelined query execution. The symmetric hash join operator supports such pipelining, but it requires all base data and intermediate results to be memory-resident, which is unacceptable for complex queries over large datasets. In this paper we present a multi-threaded extension of the symmetric hash join, called XJoin, that can execute effectively with far less memory. By reactively scheduling background processing, XJoin hides intermittent delays in data arrival to produce more tuples earlier. XJoin includes a very efficient, on-the-fly algorithm for preventing duplicates from being created by its independently running threads. We have implemented the XJoin operator and added it to the PREDATOR Object-Relational DBMS. Using this implementation along with traces obtained by monitoring Internet data delivery, we show that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources. (Also cross-referenced as UMIACS-TR-99-13)