Shared Index Scans For Data Warehouses

Thumbnail Image

Files (993.42 KB)
No. of downloads: 316
CS-TR-4233.pdf (639.99 KB)
No. of downloads: 1615

Publication or External Link







Tree based indexing structures like B-trees, B+trees, Bitmap indexes and R-trees have become essential for getting good performance when accessing vast datasets. However, most database research seems to ignore the behavior that the disk hardware observes during index scans. In this paper we aim to refocus attention on efficiently utilizing the underlying hardware during concurrent index scans. We propose a new "transcurrent execution model" (TEM) for concurrent user queries against tree indexes. Our model exploits intra-parallelism of the index scan and dynamically decomposes each query into a set of disjoint "query patches". TEM integrates the ideas of prefetching and shared scans in a new framework, suitable for dynamic multi-user environments. It supports time constraints in the scheduling of these patches and introduces the notion of data flow for achieving a steady progress of all queries. Our experiments demonstrate that the transcurrent query execution results in high locality of I/O which in turn translates to substantial performance benefits in terms of query execution time, buffer hit ratio and disk throughput. These benefits increase as the workload in the warehouse increases and offer a highly scalable solution to the I/O problem of data warehouses. (Cross-referenced as UMIACS-TR-2001-22)