Infrastructure for Building Parallel Database Systems for Multi-dimensional Data

Thumbnail Image
Files KB)
No. of downloads: 310
CS-TR-3894.pdf(260.83 KB)
No. of downloads: 981
Publication or External Link
Chang, Chialin
Sussman, Alan
Saltz, Joel
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. We also present some preliminary performance results comparing the implementation of a remote-sensing image database using the T2 services with a custom-built integrated implementation. (Also cross-referenced as UMIACS-TR-98-24)