Infrastructure for Building Parallel Database Systems for
Multi-dimensional Data
Infrastructure for Building Parallel Database Systems for
Multi-dimensional Data
Files
Publication or External Link
Date
1998-10-15
Authors
Chang, Chialin
Sussman, Alan
Saltz, Joel
Advisor
Citation
DRUM DOI
Abstract
As computational power and storage capacity increase, processing and
analyzing large volumes of multi-dimensional datasets play an increasingly
important part in many domains of scientific research.
Our study of a large set of scientific applications over the past three
years indicates that the processing for such datasets is often highly
stylized and shares several important characteristics. Usually, both the
input dataset as well as the result being computed have underlying
multi-dimensional grids. The basic processing step usually consists of
transforming individual input items, mapping the transformed items to the
output grid and computing output items by aggregating, in some way, all the
transformed input items mapped to the corresponding grid point. In this
paper, we present the design of T2, a customizable parallel database
that integrates storage, retrieval and processing of multi-dimensional
datasets. T2 provides support for common operations including index
generation, data retrieval, memory management, scheduling of processing
across a parallel machine and user interaction. It achieves its primary
advantage from the ability to seamlessly integrate data retrieval and
processing for a wide variety of applications and from the ability to
maintain and jointly process multiple datasets with different underlying
grids. We also present some preliminary performance results comparing the
implementation of a remote-sensing image database using the T2 services
with a custom-built integrated implementation.
(Also cross-referenced as UMIACS-TR-98-24)