Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    Declustering R-Tree on Multi-Computer Architectures
    (1994) Koudas, N.; Faloutsos, Christos; Kamel, Ibrahim; ISR
    We study a method to decluster a spatial access method (and specifically an R-tree) on a shared-nothing multi-computer architecture [9]. Our first step is to propose a software architecture, with the top levels of the R-tree on the 'master'server' and the leaf nodes distributed across the servers. Nest, we study the optimal capacity of leaf nodes, or 'chunk size'. We express the response time on range queries as a function of the 'chunk size', and we show how to optimize it. This formula assumes that the 'chunks' are perfectly declustered. We propose to use the Hilbert curve to achieve such a good declustering.

    Finally, we implemented our method on a network of workstations and we compared the experimental and the theoretical results. The conclusions are that (a) our formula for the response time is accurate (the maximum relative error was 30%; the typical error was in the vicinity of 10-15%) (b) the Hilbert-based declustering consistently outperforms a random declustering (c) most importantly, although the optimal chunk size depends on several factors (database size, size of the query, speed of the network), a safe choice for it is 1 page (whichever is the page size of the operating system). We show analytically and experimentally that a chunk size of 1 page gives either optimal or close to optimal results, for a wide range of the parameters.

  • Thumbnail Image
    Item
    The Time Index+ : An Incremental Access Structure for Temporal Databases
    (1994) Kouramajian, Vram; Kamel, Ibrahim; Elmasri, Ramez; Waheed, Syed; ISR
    In this paper, we propose a new indexing structure, called the Time Index+, which extends the incremental structure technique introduced in the Time Index [ElWK90, ElKG93]. The Time Index performs well for data that often overlaps and has a non-- uniform distribution. However, it requires huge amounts of storage and suffers from degradation in update performance. The Time Index+ overcomes the deficiencies of the Time Index by proposing an efficient new storage model for partitioning logical buckets and by suggesting a graceful new method for handling object versions with long and very long time intervals.

    We validate our claims for the efficiency of our new techniques by analyzing and, comparing four indexing structures: the Time Index$^{+}$, the Time Index, the, Packed R--Tree~[RoLe85, KaFa93], and the Parameterized R--Tree. Our simulation identifies important parameters, and shows how they affect the performance of the four considered indexing structures. These include mean of version lifespan, block size, query time interval length, and total number of versions. Our simulation results show that: (1) The Time Index+ requires on average 60% less storage than the the Time Index but 50% more storage than the Packed R--Tree and the Parameterized R--Tree and (2) the Time Index+ provides an improvement in search time of 10% over the Time Index, of an order of magnitude over the Packed R--Tree, and of at least 100% over the Parameterized R--Tree.

  • Thumbnail Image
    Item
    Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension
    (1993) Faloutsos, Christos; Kamel, Ibrahim; ISR
    We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA's Infrared-Ultraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, non-integer fractal dimension.

    Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the R-trees. We provide the first analysis of R- trees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5%, and it rarely exceeds 10%.

    We believe that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for any spatial access method, as well as better estimates for query optimization on multi-attribute queries.

  • Thumbnail Image
    Item
    Hilbert R-Tree: An Improved R-Tree Using Fractals
    (1993) Kamel, Ibrahim; Faloutsos, Christos; ISR
    We propose a new R-tree structure that outperforms all the older ones. The heart of the idea is to impose a linear ordering on the data rectangles. This ordering has to be 'good', in the sense that it should cluster 'similar' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Among the orderings we tried, the '2D-c' method, the one that uses the (2d) hilbert value of the center of the rectangles, gives the best results.

    For a static database, the proposed ordering achieves superior packing, outperforming older packing methods [25], and the best dynamic method (R*-trees [3]). The savings are as high as 36% on real data.

    more importantly, we introduce a dynamic variation, the Hilbert R- tree: : Given the ordering, every node has a well- defined set of sibling nodes; thus, we can deploy the deferred splitting algorithms of the B* -trees. By adjusting the split policy, the Hilbert R-tree can achieve as high utilization as desired. We show that a '3-to-4' split policy achieves good results, consistently outperforming the R* -trees, with up to 28% savings on real data.

  • Thumbnail Image
    Item
    Packed R-trees Using Fractals
    (1993) Faloutsos, Christos; Kamel, Ibrahim; ISR
    We propose a new packing technique for R-trees for static databases. Given a collection of rectangles, we sort them and we build the R-tree bottom-up. There are several ways to sort the rectangles; the innovation of this work is the use of fractals, and specifically the hilbert curve, to achieve better ordering of the rectangles and eventually better packing. We proposed and implemented several variations and performed experiments on synthetic, as well as real data ( a TIGER file from the U.S. Bureau of Census). The winning variation ('2D-c') was the one that sorts the rectangles according to the hilbert value of the center. This variation consistently outperforms the R-trees [3], Guttman's R-trees [13], as well as the packing method of Roussoupoulos and Leifker [24]. The performance gain of the our method seems to increase with the skeweness of the data distribution; specifically, on the (highly skewed) TIGER dataset, it achieves up to 36% improvement in response time over the best know R-tree variant and up to 58% over the older packing algorithm.