Beyond Uniformity and Independence: Analysis of R-Trees Using the Concept of Fractal Dimension",
Abstract
We propose the concept of fractal dimension of a set of points,
in order to quantify the deviation from the uniformity distribution.
Using measurements on real data sets (road intersections of U.S. counties,
star coordinates from NASA's Infrared-Ultraviolet Explorer etc.)
we provide evidence that real data indeed are skewed,
and, moreover, we show that they behave as mathematical fractals,
with a measurable, non-integer fractal dimension.
Armed with this tool, we then show its practical use in predicting
the performance of spatial access methods,
and specifically of the R-trees.
We provide the {\em first} analysis of R-trees for skewed distributions
of points:
We develop a formula that estimates the number of disk accesses
for range queries, given only
the fractal dimension
of the point set, and its count.
Experiments on real data sets show that the formula is very accurate:
the relative error is usually below 5\%,
and it rarely exceeds 10\%.
We believe that the fractal dimension will help replace the uniformity
and independence assumptions,
allowing more accurate analysis
for {\em any} spatial access method,
as well as better estimates
for query optimization on multi-attribute queries.
NOTE - Appeared in PODS 1994. Christos Faloutsos and Ibrahim Kamel.
"Beyond Uniformity and Independence: Analysis of R-Trees Using the
Concept of Fractal Dimension", Proc. ACM SIGACT-SIGMOD-SIGART
PODS. Minneapolis, MN (May 1994), pp. 4-13.
(Also cross-referenced as UMIACS-TR-93-130)