The Dwarf Data Cube Eliminates the Highy Dimensionality Curse

Loading...
Thumbnail Image

Files

CS-TR-4552.pdf (246.26 KB)
No. of downloads: 824

Publication or External Link

Date

2003-12-18

Advisor

Citation

DRUM DOI

Abstract

The data cube operator encapsulates all possible groupings of a data set and has proved to be an invaluable tool in analyzing vast amounts of data. However its apparent exponential complexity has significantly limited its applicability to low dimensional datasets. Recently the idea of the dwarf data cube model was introduced, and showed that high-dimensional ``dwarf data cubes'' are orders of magnitudes smaller in size than the original data cubes even when they calculate and store every possible aggregation with 100% precision. In this paper we present a surprising analytical result proving that the size of dwarf cubes grows polynomially with the dimensionality of the data set and, therefore, a full data cube at 100% precision is not inherently cursed by high dimensionality. This striking result of polynomial complexity reformulates the context of cube management and redefines most of the problems associated with data-warehousing and On-Line Analytical Processing. We also develop an efficient algorithm for estimating the size of dwarf data cubes before
actually computing them. Finally, we complement our analytical approach
with an experimental evaluation using real and synthetic data sets, and demonstrate our results. UMIACS-TR-2003-120

Notes

Rights