System support for keyword-based search in structured Peer-to-Peer systems
MetadataShow full item record
In this dissertation, we present protocols for building a distributed search infrastructure over structured Peer-to-Peer systems. Unlike existing search engines which consist of large server farms managed by a centralized authority, our approach makes use of a distributed set of end-hosts built out of commodity hardware. These end-hosts cooperatively construct and maintain the search infrastructure. The main challenges with distributing such a system include node failures, churn, and data migration. Localities inherent in query patterns also cause load imbalances and hot spots that severely impair performance. Users of search systems want their results returned quickly, and in ranked order. Our main contribution is to show that a scalable, robust, and distributed search infrastructure can be built over existing Peer-to-Peer systems through the use of techniques that address these problems. We present a decentralized scheme for ranking search results without prohibitive network or storage overhead. We show that caching allows for efficient query evaluation and present a distributed data structure, called the View Tree, that enables efficient storage, and retrieval of cached results. We also present a lightweight adaptive replication protocol, called LAR that can adapt to different kinds of query streams and is extremely effective at eliminating hotspots. Finally, we present techniques for storing indexes reliably. Our approach is to use an adaptive partitioning protocol to store large indexes and employ efficient redundancy techniques to handle failures. Through detailed analysis and experiments we show that our techniques are efficient and scalable, and that they make distributed search feasible.