XJoin: Getting Fast Answers From Slow and Bursty Networks
XJoin: Getting Fast Answers From Slow and Bursty Networks
Files
Publication or External Link
Date
1999-02-26
Authors
Urhan, Tolga
Franklin, Michael J.
Advisor
Citation
DRUM DOI
Abstract
The combination of increasingly ubiquitous Internet connectivity
and advances in heterogeneous and semi-structured databases has
the potential to enable database-style querying over data from
sources distributed around the world. Traditional query processing
techniques, however, fail to deliver acceptable performance
in such a scenario for two main reasons: First, they
optimize for delivery of the entire query result, while on-line users
would typically benefit from receiving initial results as quickly
as possible. Second, slow or bursty delivery of data from remote
sources can stall query execution, making the already
inadequate batch-like behavior even worse. Both of these problems
can be addressed using fully pipelined query execution. The symmetric
hash join operator supports such pipelining, but it requires all
base data and intermediate results to be memory-resident, which is
unacceptable for complex queries over large datasets.
In this paper we present a multi-threaded extension of the symmetric
hash join, called XJoin, that can execute effectively with far less memory.
By reactively scheduling background processing, XJoin hides
intermittent delays in data arrival to produce more tuples earlier.
XJoin includes a very efficient, on-the-fly algorithm for preventing
duplicates from being created by its independently running threads.
We have implemented the XJoin operator and added
it to the PREDATOR Object-Relational DBMS. Using this implementation along
with traces obtained by monitoring Internet data delivery, we show
that XJoin is an effective solution for providing fast query responses
to users even in the presence of slow and bursty remote sources.
(Also cross-referenced as UMIACS-TR-99-13)