Learning Response Time for WebSources using Query Feedback and Application in Query Optimization

Thumbnail Image


CS-TR-3952.ps (1.1 MB)
No. of downloads: 361
CS-TR-3952.pdf (484.55 KB)
No. of downloads: 1768

Publication or External Link







The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current optimization technology for wrapper mediator architectures needs to be extended to estimate the response time (delays) to access WebSources and to use this delay in query optimization. In this paper, we present a Multi-Dimensional Table (MDT), a tool that is based on learning using query feedback from WebSources. We describe the MDT learning algorithms, and report on the MDT learning for WebSources. The MDT uses dimensions Time of day, Day, and Quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay), and a confidence in this prediction, for some query. Experiment data was collected from several WebSources and analyzed, to determine those dimensions that were significant in estimating the response time for particular WebSources. Our research shows that we can improve the quality of learning by tuning the MDT features, e.g., including significant dimensions in the MDT, or changing the ordering of dimensions. We then demonstrate how the MDT prediction of delay may be used by a scrambling enabled optimizer.
A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the MDT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where MDT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and MDT learning, we test how good the MDT is in minimizing these penalties. Also cross-referenced as UMIACS TR #98-64