Show simple item record


dc.contributor.advisorSussman, Alanen_US
dc.contributor.authorNam, Beomseoken_US
dc.description.abstractScientific data analysis applications require large scale computing power to effectively service client queries and also require large storage repositories for datasets that are generated continually from sensors and simulations. These scientific datasets are growing in size every day, and are becoming truly enormous. The goal of this dissertation is to provide efficient multidimensional indexing techniques that aid in navigating distributed scientific datasets. In this dissertation, we show significant improvements in accessing distributed large scientific datasets. The first approach we took to improve access to subsets of large multidimensional scientific datasets, was data chunking. The contents of scientific data files typically are a collection of multidimensional arrays, along with the corresponding metadata. Data chunking groups data elements into small chunks of a fixed, but data-specific, size to take advantage of spatio-temporal locality since it is not efficient to index individual data elements of large scientific datasets. The second approach was the design of an efficient multidimensional index for scientific datasets. This work investigates how existing multidimensional indexing structures perform on chunked scientific datasets, and compares their performance with that of our own indexing structure, SH-trees. Since R-trees were proposed, various multidimensional indexing structures have been proposed. However, there are a relatively small number of studies focused on improving the performance of indexing geographically distributed datasets, especially across heterogeneous machines. As a third approach, in an attempt to accelerate indexing performance for distributed datasets, we proposed several distributed multidimensional indexing schemes: replicated centralized indexing, hierarchical two level indexing, and decentralized two level indexing. Our experimental results show that great performance improvements are gained from distribution of multidimensional index. However, the design choices for distributed indexing, such as replication, partitioning, and decentralization, must be carefully considered since they may decrease the overall performance in certain situations. Therefore, this work provides performance guidelines to aid in selecting the best distributed multidimensional indexing scheme for various systems and applications. Finally, we describe how a distributed multidimensional indexing scheme can be used by a distributed multiple query optimization middleware as a case-study application to generate better query plans by leveraging information about the contents of remote caches.en_US
dc.format.extent1963058 bytes
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentComputer Scienceen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pquncontrolledDistributed Systemsen_US
dc.subject.pquncontrolledMultidimensional Indexingen_US
dc.subject.pquncontrolledScientific Datasetsen_US
dc.subject.pquncontrolledDistributed Indexingen_US
dc.subject.pquncontrolledQuery Optimizationen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record