University of Maryland DRUM  
University of Maryland Digital Repository at the University of Maryland

DRUM >
College of Computer, Mathematical & Natural Sciences >
Computer Science >
Technical Reports from UMIACS >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/1186

Title: Dwarf: Shrinking the PetaCube
Authors: Sismanis, Yannis
Deligiannakis, Antonios
Roussopoulos, Nick
Kotidis, Yannis
Type: Technical Report
Issue Date: 4-Apr-2002
Series/Report no.: UM Computer Science Department; CS-TR-4342
UMIACS; UMIACS-TR-2002-24
Abstract: Dwarf is a highly compressed structure for computing, storing, and querying data cubes. Dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions. This paper describes the Dwarf structure and the Dwarf cube construction algorithm. Further optimizations are then introduced for improving clustering and query performance. Experiments with the current implementation include comparisons on detailed measurements with real and synthetic datasets against previously published techniques. The comparisons show that Dwarfs by far outperform these techniques on all counts: storage space, creation time, query response time, and updates of cubes. Also UMIACS-TR-2002-24
URI: http://hdl.handle.net/1903/1186
Appears in Collections:Technical Reports of the Computer Science Department
Technical Reports from UMIACS

Files in This Item:

File Description SizeFormatNo. of Downloads
CS-TR-4342.pdf145.12 kBAdobe PDF439View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments. -
All Contents