Efficient Refreshment of Data Warehouse Views
Efficient Refreshment of Data Warehouse Views
Files
Publication or External Link
Date
1998-10-15
Authors
Baekgaard, Lars
Roussopoulos, Nick
Advisor
Citation
DRUM DOI
Abstract
A data warehouse is a view on a set of
distributed and possible loosely coupled source databases. For
efficiency reasons a warehouse should be maintained as a materialized
view. Therefore, efficient incremental algorithms must be used to
periodically refresh the data warehouse. It is possible and desirable
to separate the process of warehouse refreshment from the process of
warehouse use. In this paper we describe and compare view refreshment
algorithms that are based on different combinations of materialized
views, partially materialized views, and pointers. Our contribution is
twofold. First, our algorithms and data structures are designed to
minimize network communication and interactions between the warehouse
and the source databases. The minimal set of data that is
necessary for both warehouse refreshment and warehouse use is stored
on the warehouse. Second, we describe the results of an experiment
comparing these methods with respect to storage overhead and I/O.
Briefly, the experiment show that algorithms based on a
combination of partially materialized views and pointers outperforms
algorithms based on materialized views.
(Also cross-referenced as UMIACS-TR-96-33)