The Dwarf Data Cube Eliminates the Highy Dimensionality Curse
Abstract
The data cube operator encapsulates all possible groupings of a
data set and has proved to be an invaluable tool in analyzing vast amounts
of data. However its apparent exponential complexity has significantly
limited its applicability to low dimensional datasets. Recently the idea
of the dwarf data cube model was introduced, and showed that
high-dimensional ``dwarf data cubes'' are orders of magnitudes smaller in
size than the original data cubes even when they calculate and store every
possible aggregation with 100\% precision.
In this paper we present a surprising analytical result proving
that the size of dwarf cubes grows polynomially with the
dimensionality of the data set and, therefore, a full data cube at 100%
precision is not inherently cursed by high dimensionality. This striking
result of polynomial complexity reformulates the context of cube
management and redefines most of the problems associated with
data-warehousing and On-Line Analytical Processing. We also develop an
efficient algorithm for estimating the size of dwarf data cubes before
actually computing them. Finally, we complement our analytical approach
with an experimental evaluation using real and synthetic data sets, and
demonstrate our results.
UMIACS-TR-2003-120