The Dwarf Data Cube Eliminates the Highy Dimensionality Curse

Sismanis, Yannis; Roussopoulos, Nick

The Dwarf Data Cube Eliminates the Highy Dimensionality Curse

Files

CS-TR-4552.pdf (246.26 KB)

No. of downloads: 848

Date

2003-12-18

Authors

Sismanis, Yannis

Roussopoulos, Nick

Abstract

The data cube operator encapsulates all possible groupings of a data set and has proved to be an invaluable tool in analyzing vast amounts of data. However its apparent exponential complexity has significantly limited its applicability to low dimensional datasets. Recently the idea of the dwarf data cube model was introduced, and showed that high-dimensional ``dwarf data cubes'' are orders of magnitudes smaller in size than the original data cubes even when they calculate and store every possible aggregation with 100% precision. In this paper we present a surprising analytical result proving that the size of dwarf cubes grows polynomially with the dimensionality of the data set and, therefore, a full data cube at 100% precision is not inherently cursed by high dimensionality. This striking result of polynomial complexity reformulates the context of cube management and redefines most of the problems associated with data-warehousing and On-Line Analytical Processing. We also develop an efficient algorithm for estimating the size of dwarf data cubes before
actually computing them. Finally, we complement our analytical approach
with an experimental evaluation using real and synthetic data sets, and demonstrate our results. UMIACS-TR-2003-120

URI (handle)

http://hdl.handle.net/1903/1333

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department

Full item page