Improving Access to Multi-dimensional Self-describing Scientific Datasets
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Applications that query into very large multi-dimensional datasets are
becoming more common. Many self-describing scientific data file formats
have also emerged, which have structural metadata to help navigate the
multi-dimensional arrays that are stored in the files. The files may also
contain application-specific semantic metadata. In this paper, we discuss
efficient methods for performing searches for subsets of multi-dimensional
data objects, using semantic information to build multi-dimensional
indexes, and group data items into properly sized chunks to maximize disk
I/O bandwidth. This work is the first step in the design and
implementation of a generic indexing library that will work with various high-dimension scientific data file formats containing semantic information
about the stored data. To validate the approach, we have implemented
indexing structures for NASA remote sensing data stored in the HDF format
with a specific schema (HDF-EOS), and show the performance improvements
that are gained from indexing the datasets, compared to using the existing
HDF library for accessing the data.
(UMIACS-TR-2002-99)