Efficient Refreshment of Data Warehouse Views
Publication or External Link
A data warehouse is a view on a set of distributed and possible loosely coupled source databases. For efficiency reasons a warehouse should be maintained as a materialized view. Therefore, efficient incremental algorithms must be used to periodically refresh the data warehouse. It is possible and desirable to separate the process of warehouse refreshment from the process of warehouse use. In this paper we describe and compare view refreshment algorithms that are based on different combinations of materialized views, partially materialized views, and pointers. Our contribution is twofold. First, our algorithms and data structures are designed to minimize network communication and interactions between the warehouse and the source databases. The minimal set of data that is necessary for both warehouse refreshment and warehouse use is stored on the warehouse. Second, we describe the results of an experiment comparing these methods with respect to storage overhead and I/O. Briefly, the experiment show that algorithms based on a combination of partially materialized views and pointers outperforms algorithms based on materialized views. (Also cross-referenced as UMIACS-TR-96-33)