Optimizing Retrieval and Processing of Multi-dimensional Scientific
Datasets
Optimizing Retrieval and Processing of Multi-dimensional Scientific
Datasets
Loading...
Files
Publication or External Link
Date
2000-02-02
Authors
Chang, Chialin
Kurc, Tahsin
Sussman, Alan
Saltz, Joel
Advisor
Citation
DRUM DOI
Abstract
Exploring and analyzing large volumes of data plays an increasingly
important role in many domains of scientific research. We have been
developing the Active Data Repository (ADR), an infrastructure that
integrates storage, retrieval, and processing of large multi-dimensional
scientific datasets on distributed memory parallel machines with multiple
disks attached to each node. In earlier work, we proposed three strategies
for processing range queries within the ADR framework. Our experimental
results show that the relative performance of the strategies changes under
varying application characteristics and machine configurations. In this
work we investigate approaches to guide and automate the selection of the
best strategy for a given application and machine configuration. We
describe analytical models to predict the relative performance of the
strategies when input data elements are uniformly distributed in the
attribute space of the output dataset, restricting the output dataset to
be a regular $d$-dimensional array. We present an experimental evaluation
of these models for various synthetic datasets and for several driving
applications on a 128-node IBM SP.
(Also cross-referenced as UMIACS-TR-2000-03)