Browsing by Author "Franklin, Michael J."
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Broadcast Disks: Data Management for Asymmetric Communication Environments(1998-10-15) Acharya, Swarup; Alon, Rafael; Franklin, Michael J.; Zdonik, StanleyThis paper proposes the use of repetitive broadcast as a way of augmenting the memory hierarchy of clients in an asymmetric communication environment. We describe a new technique called "Broadcast Disks" for structuring the broadcast in a way that provides improved performance for non-uniformly accessed data. The Broadcast Disk superimposes multiple disks spinning at different speeds on a single broadcast channel in effect creating an arbitrarily fine-grained memory hierarchy. In addition to proposing and defining the mechanism, a main result of this work is that exploiting the potential of the broadcast structure requires a reevaluation of basic cache management policies. We examine several "pure" cache management policies and develop and measure implementable approximations to these policies. These results and others are presented in a set of simulation studies that substantiates the basic idea and develops some of the intuitions required to design a particular broadcast program. (Also cross-referenced as UMIACS-TR-94-120)Item Dynamic Query Operator Scheduling for Wide-Area Remote Access(1998-10-15) Amsaleg, Laurent; Franklin, Michael J.; Tomasic, AnthonyDistributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based query execution model performs poorly. We have developed a class of methods, called query scrambling, for dealing explicitly with the problem of unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. In this paper we focus on the dynamic scheduling of query operators in the context of query scrambling. We explore various choices for dynamic scheduling and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and delayed or possibly bursty arrivals of data. Our performance results show that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS- TR-97-54Item Efficient Incremental Garbage Collection for Workstation/Server Database Systems(1998-10-15) Amsaleg, Laurent; Franklin, Michael J.; Gruber, OlivierWe describe an efficient server-based algorithm for garbage collecting object-oriented databases in a workstation/server environment. The algorithm is incremental and runs concurrently with client transactions, however, it does not hold any locks on data and does not require callbacks to clients. It is fault tolerant, but performs very little logging. The algorithm has been designed to be integrated into existing OODB systems, and therefore it works with standard implementation techniques such as two-phase locking and write-ahead-logging. In addition, it supports client-server performance optimizations such as client caching and flexible management of client buffers. We describe an implementation of the algorithm in the EXODUS storage manager and present results from an initial performance study of the implementation. These results demonstrate that the introduction of the garbage collector adds minimal overhead to client operations . (Also cross-referenced as UMIACS-TR-94-121)Item Flexible User Profiles for Large Scale Data Delivery(1999-03-30) Cetintemel, Ugur; Franklin, Michael J.; Giles, C. LeePush-based data delivery requires knowledge of user interests for making scheduling, bandwidth allocation, and routing decisions. Such information is maintained as user profiles. We propose a new incremental algorithm for constructing user profiles based on monitoring and user feedback. In contrast to earlier approaches, which typically represent profiles as a single weighted interest vector, we represent user-profiles using multiple interest clusters, whose number, size, and elements change adaptively based on user access behavior. This flexible approach allows the profile to more accurately represent complex user interests. The approach can be tuned to trade off profile complexity and effectiveness, making it suitable for use in large-scale information filtering applications such as push-based WWW page dissemination. We evaluate the method by experimentally investigating its ability to categorize WWW pages taken from Yahoo! categories. Our results show that the method can provide high retrieval effectiveness with modest profile sizes and can effectively adapt to changes in users' interests. Also cross-referenced as UMIACS-TR-99-18Item Partitioning vs. Replication for Token-Based Commodity(2000-09-12) Cetintemel, Ugur; Ozden, Banu; Franklin, Michael J.; Silberschatz, AviThe proliferation of e-commerce has enabled a new set of applications that allow globally distributed purchasing of commodities such as books, CDs, travel tickets, etc., over the Internet. These commodities can be represented on line by tokens, which can be distributed among servers to enhance the performance and availability of such applications. There are two main approaches for distributing such tokens ? replication and partitioning. Token replication requires expensive distributed synchronization protocols to provide data consistency, and is subject to both high latency and blocking in case of network partitions. On the other hand, token partitioning allows many transactions to execute locally without any global synchronization, which results in low latency and immunity against network partitions. In this paper, we examine the Data-Value Partitioning (DVP) approach to token-based commodity distribution. We propose novel DVP strategies that vary in the way they redistribute tokens among the servers of the system. Using a detailed simulation model and real Internet message traces, we investigate the performance of our DVP strategies by comparing them against a previously proposed scheme, Generalized Site Escrow (GSE), which is based on replication and escrow transactions. Our experiments demonstrate that, for the types of applications and environment we address, replication-based approaches are neither necessary nor desirable, as they inherently require quorum synchronization to maintain consistency. We show that DVP, primarily due to its ability to provide high server autonomy, performs favorably in all cases studied. (Also cross-referenced as UMIACS-TR-2000-62Item Query Scrambling for Bursty Data Arrival.(1998-10-15) Amsaleg, Laurent; Franklin, Michael J.; Tomasic, A.Distributed databases operating over wide-area networks, such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources may vary widely due to network congestion, link failure, and other problems. In this paper we examine a new class of methods, called query scrambling, for dealing with unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. We explore various choices in the implementation of these methods and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and it focuses on bursty arrivals of data. We identify and study a number of the basic trade-offs that arise when designing scrambling policies for the bursty environment. Our performance results show that query scrambling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS-TR-96-84)Item Scrambling Query Plans to Cope With Unexpected Delays(1998-10-15) Amsaleg, Laurent; Franklin, Michael J.; Tomasic, A.; Urhan., T.Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query plan modification techniques that we call query plan scrambling. We present an algorithm which modifies execution plans on-the-fly in response to unexpected delays in data access. The algorithm both reschedules operators and introduces new operators into the plan. We present simulation results that show how our technique effectively hides delays in receiving the initial requested tuples from remote data sources. (Also cross-referenced as UMIACS-TR-96-35)Item A Study of Query Execution Strategies for Client-Server Database Systems(1998-10-15) Kossmann, Donald; Franklin, Michael J.Query processing in a client-server database system raises the question of where to execute queries to minimize the communication costs and response time of a query, and to load-balance the system. This paper evaluates the two common query execution strategies, data shipping and query shipping, and a policy referred to as hybrid shipping. Data shipping determines that queries be executed at clients; query shipping determines that queries be executed at servers; and hybrid shipping provides the flexibility to execute queries at clients and servers. The experiments with a client-server model confirm that the query execution policy is critical for the performance of a system. Neither data nor query shipping are optimal in all situations, and the performance penalities can be substantial. Hybrid shipping at least matches the best performance of data and query shipping and shows better performance than both in many cases. The performance of hybrid shipping plans, however, is shown to be sensitive to changes in the state of the system (e.g., the load of machines and the contents of caches). Initial experiments indicate that an extended version of a 2-step optimization may be an effective strategy for adjusting plans according to the state of the system at runtime. (Also cross-referenced as UMIACS-TR-95-85)Item Transactional Client-Server Cache Consistency: Alternatives and Performance(1998-10-15) Franklin, Michael J.; Carey, Michael J.; Livny, MironClient-server database systems based on a page server model can exploit client memory resources by caching copies of pages across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this paper we provide a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algorithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The insight gained in this manner is then used to reflect upon the characteristics of other algorithms that have been proposed. The results show that the interactions among dimensions of the design space can impact performance in many ways, and that classifications of algorithms as simply Pessimistic" or Optimistic" do not accurately characterize the similarities and differences among the many possible cache consistency algorithms. (Also cross-referenced as UMIACS-TR-95-84)Item XJoin: Getting Fast Answers From Slow and Bursty Networks(1999-02-26) Urhan, Tolga; Franklin, Michael J.The combination of increasingly ubiquitous Internet connectivity and advances in heterogeneous and semi-structured databases has the potential to enable database-style querying over data from sources distributed around the world. Traditional query processing techniques, however, fail to deliver acceptable performance in such a scenario for two main reasons: First, they optimize for delivery of the entire query result, while on-line users would typically benefit from receiving initial results as quickly as possible. Second, slow or bursty delivery of data from remote sources can stall query execution, making the already inadequate batch-like behavior even worse. Both of these problems can be addressed using fully pipelined query execution. The symmetric hash join operator supports such pipelining, but it requires all base data and intermediate results to be memory-resident, which is unacceptable for complex queries over large datasets. In this paper we present a multi-threaded extension of the symmetric hash join, called XJoin, that can execute effectively with far less memory. By reactively scheduling background processing, XJoin hides intermittent delays in data arrival to produce more tuples earlier. XJoin includes a very efficient, on-the-fly algorithm for preventing duplicates from being created by its independently running threads. We have implemented the XJoin operator and added it to the PREDATOR Object-Relational DBMS. Using this implementation along with traces obtained by monitoring Internet data delivery, we show that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources. (Also cross-referenced as UMIACS-TR-99-13)