Design of a Framework for Data-Intensive Wide-Area Applications
Design of a Framework for Data-Intensive Wide-Area Applications
Files
Publication or External Link
Date
2000-02-23
Authors
Beynon, Michael D.
Kurc, Tahsin
Sussman, Alan
Saltz, Joel
Advisor
Citation
DRUM DOI
Abstract
Applications that use collections of very large, distributed datasets
have become an increasingly important part of science and engineering.
With high performance wide-area networks becoming more pervasive, there
is interest in making collective use of distributed computational and
data resources. Recent work has converged to the notion of the Grid,
which attempts to uniformly present a heterogeneous collection of
distributed resources. Current Grid research covers many areas from
low level infrastructure issues to high level application concerns.
However, providing support for efficient exploration and processing of
very large scientific datasets stored in distributed archival storage
systems remains a challenging research issue.
We have initiated an effort that focuses on developing efficient
data-intensive applications in a Grid environment. In this paper, we
present a framework, called filter-stream programming, that represents
the processing units of a data-intensive application as a set of
filters, which are designed to be efficient in their use of memory and
scratch space. We describe a prototype infrastructure that supports
execution of applications using the proposed framework. We present
the implementation of two applications using the filter-stream
programming framework, and discuss experimental results demonstrating
the effects of heterogeneous resources on application performance.
(Also cross-referenced as UMIACS-TR-2000-04)