T2: A Customizable Parallel Database For Multi-dimensional Data
T2: A Customizable Parallel Database For Multi-dimensional Data
Loading...
Files
Publication or External Link
Date
1998-10-15
Authors
Chang, Chialin
Acharya, Anurag
Sussman, Alan
Saltz, Joel
Advisor
Citation
DRUM DOI
Abstract
As computational power and storage capacity increase, processing and
analyzing large volumes of multi-dimensional datasets play an increasingly
important part in many domains of scientific research.
Several database research groups and vendors have developed
object-relational
database systems to provide some support for managing and/or visualizing
multi-dimensional datasets.
These systems, however, provide little or
no support for analyzing or processing these datasets -- the
assumption is that this is too application-specific to warrant common
support. As a result, applications that process these datasets are
analyzing large volumes of multi-dimensional datasets play an increasingly
important part in many domains of scientific research.
Several database research groups and vendors have developed
object-relational
database systems to provide some support for managing and/or visualizing
multi-dimensional datasets.
These systems, however, provide little or
no support for analyzing or processing these datasets -- the
assumption is that this is too application-specific to warrant common
support. As a result, applications that process these datasets are
usually decoupled from data storage and management, resulting in
inefficiency due to copying and loss of locality. Furthermore, every
application developer has to implement complex support for managing
and scheduling the processing.
Our study of a large set of scientific applications over the past three
years
indicates that the processing for such datasets
is often highly stylized and shares several important characteristics.
Usually, both the input dataset as
well as the result being computed have underlying multi-dimensional
grids. The basic processing step usually consists of transforming
individual input items, mapping the transformed items to the output
grid and computing output items by aggregating, in some way, all the
transformed input items mapped to the corresponding grid point.
In this paper,
we present the design of T2, a customizable parallel database
that integrates storage, retrieval and processing of multi-dimensional
datasets. T2 provides support for common operations including
index generation, data retrieval, memory management, scheduling of
processing across a parallel machine and user interaction. It
achieves its primary advantage from the ability to seamlessly
integrate data retrieval and processing for a wide variety of
applications and from the ability to maintain and jointly process
multiple datasets with different underlying grids.
(Also cross-referenced as UMIACS-TR-98-04)