Query Planning for Range Queries with User-defined Aggregation on Multi-dimensional Scientific Datasets

Loading...
Thumbnail Image

Files

CS-TR-3996.ps (1.09 MB)
No. of downloads: 265
CS-TR-3996.pdf (595.64 KB)
No. of downloads: 816

Publication or External Link

Date

1999-02-23

Advisor

Citation

DRUM DOI

Abstract

Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is usually highly stylized, with the basic processing steps consisting of (1) retrieval of a subset of all available data in the input dataset via a range query, (2) projection of each input data item to one or more output data items, and (3) some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on shared-nothing architectures. In this paper we address query planning and execution strategies for range queries with user-defined processing. We evaluate three potential query planning strategies within the ADR framework under several application scenarios, and present experimental results on the performance of the strategies on a multiprocessor IBM SP2. (Also cross-refereced as UMIACS-TR-99-15)

Notes

Rights