Compiler Supported High-level Abstractions for Sparse Disk-Resident Datasets
Files
Publication or External Link
Date
Advisor
Citation
DRUM DOI
Abstract
Processing and analysing large volumes of data plays an increasingly important role in many domains of scientific research. The complexity and irregularity of datasets in many domains make the task of developing such processing applications tedious and error-prone.
We propose use of high-level abstractions for hiding the irregularities
in these datasets and enabling rapid development of correct, but not
necessarily efficient, data processing applications. We present two
execution strategies and a set of compiler analysis techniques for
obtaining high performance from applications written using our proposed high-level abstractions. Our execution strategies achieve high locality in
disk accesses. Once a disk block is read from the disk, all iterations
that read any of the elements from this disk block are performed. To
support our execution strategies and improve the performance, we have
developed static analysis techniques for: 1) computing the set of
iterations that access a particular righ-hand-side element, 2) generating
a function that can be applied to the meta-data associated with each disk
block, for determining if that disk block needs to be read, and 3)
performing code hoisting of conditionals.
Cross-referenced as UMIACS-TR-2001-50