Browsing by Author "Roussopoulos, Nick"
Now showing 1 - 20 of 33
Results Per Page
Sort Options
Item Adaptive Cost Estimation for Client-Server based Heterogeneous Database Systems(1998-10-15) Yao, Zhaohui; Chen, Chungmin Melvin; Roussopoulos, NickIn this paper, we propose a new method for estimating query cost in client-server based heterogeneous database management system. The cost estimation parameters are adjusted by an Adaptive Cost Estimation (ACE) module which uses query execution feedback yielding more and more accurate cost estimates. The most important features of ACE are its detailed cost model which accounts for all costs incurred, its rapid convergence to the actual parameter values, and its low overhead which permits continuous adaptation during the run time of the system. ACE has been implemented and tested with Oracle 6, Oracle 7, Ingres, and ADMS. Extensive experiments performed on these systems show that the ACE's time estimates are within 20% of the real wall-clock time for more than 92% of the queries. This percentage surpasses 98% for queries over 20 seconds. (Also cross-referenced as UMIACS-TR-96-37)Item Adaptive Database Buffer Allocation Using Query Feedback(1998-10-15) Chen, Chungmin Melvin; Roussopoulos, NickIn this paper, we propose the concept of using query execution feedback for improving database buffer management. A query feedback model which adaptively quantifies the page fault characteristics of all query access patterns including sequential, looping and most importantly random, is defined. Based on this model, a load control and a marginal gain ratio buffer allocation scheme are developed. Simulation experiments show that the proposed method is consistently better than the previous methods and in most cases, it significantly outperforms all other methods for random access reference patterns. (Also cross-referenced as UMIACS-TR-93-49)Item Adaptive Probabilistic Search (APS) for Peer-to-Peer Networks(2003-02-27) Tsoumakos, Dimitrios; Roussopoulos, NickPeer-to-Peer networks are gaining increasing attention from both the scientific and the large Internet user community. Popular applications utilizing this new technology offer many attractive features to a growing number of users. At the heart of such networks lies the data retrieval algorithm. Proposed methods either depend on the network-disastrous flooding and its variations or utilize various indices too expensive to maintain. In this report we describe an adaptive, bandwidth-efficient and easy to deploy search algorithm for unstructured Peer-to-Peer networks, the Adaptive Probabilistic Search method (APS). Our scheme utilizes feedback from previous searches to probabilistically guide future ones. Extensive simulation results show that APS achieves high success rates, increased number of discovered objects, very low bandwidth consumption and adaptation to changing topologies. UMIACS-TR-2003-21Item Adaptive Selectivity Estimation Using Query Feedback(1998-10-15) Chen, Chungmin Melvin; Roussopoulos, NickItem Adaptive WebView Materialization(2001) Labrinidis, Alexandros; Roussopoulos, Nick; ISR; CSHCNDynamic content generation poses huge resource demands on web servers,creating a scalability problem. WebView Materialization, where webpages are cached and constantly refreshed in the background, has beenshown to ameliorate the scalability problem without sacrificing datafreshness. In this work we present an adaptive online algorithm toselect which WebViews to materialize, that realizes the trade-offbetween Quality of Service and Quality of Data. Our algorithm performsvery close to the static, off-line optimal algorithm, and, under rapidworkload changes, it outperforms the optimal.Item Analysis and Comparison of P2P Search Methods(2003-11-25) Tsoumakos, Dimitrios; Roussopoulos, NickThe popularity and bandwidth consumption attributed to current Peer-to-Peer file-sharing applications makes the operation of these distributed systems very important for the Internet community. Efficient object discovery is the first step towards the realization of distributed resource-sharing. In this work, we present a detailed overview of recent and existing search methods for unstructured Peer-to-Peer networks. We analyze the performance of the algorithms relative to various metrics, giving emphasis on the success rate, bandwidth-efficiency and adaptation to dynamic network conditions. Simulation results are used to empirically evaluate the behavior of nine representative schemes under a variety of different environments. (UMIACS-TR-2003-107)Item APRE: A Replication Method for Unstructured P2P Networks(2006-02) Tsoumakos, Dimitrios; Roussopoulos, NickWe present APRE, a replication method for structureless Peer-to-Peer overlays. The goal of our method is to achieve real-time replication of even the most sparsely located content relative to demand. APRE adaptively expands or contracts the replica set of an object in order to improve the sharing process and achieve a low load distribution among the providers. To achieve that, it utilizes search knowledge to identify possible replication targets inside query-intensive areas of the overlay. We present detailed simulation results where APRE exhibits both efficiency and robustness relative to the number of requesters and the respective request rates. The scheme proves particularly useful in the event of flash crowds, managing to quickly adapt to sudden surges in load.Item Automatic Deployment of Application-Specific Metadata and Code in MOCHA(1999-12-10) Rodriguez, Manuel; Roussopoulos, NickDatabase middleware systems require the deployment of application-specific data types and query operators to the servers and clients in the system. Existing middleware solutions rely on developers and system administrators to port and manually install all this application-specific functionality to all sites in the system. This approach cannot scale to an environment in which there are hundreds of data sources, such as those accessed by the Web and even more custom-tailored applications, since the complexity and the cost involved in maintaining a code base system-wide are enormous. This paper describes a novel metadata-driven framework designed to automate the deployment of all application-specific functionality used by a middleware system. We used Java and XML to implement this framework in MOCHA, a middleware system developed at the University of Maryland. We first present the kind of services, metadata elements and software tools used in MOCHA to automate code deployment. Then, we describe how the features of MOCHA simplify the administration and reduce the management cost of a middleware system in a large scale environment. (Also cross-refernced as UMIACS-TR-99-61)Item A case for in-kernel data streaming over the file subsystem(1998-10-15) Gupta, Sandeep; Baras, John S.; Kelley, Stephen; Roussopoulos, Nick(Also cross-referenced as UMIACS-TR-96-36)Item Data Reduction Techniques for Sensor Networks(2003-08-01) Deligiannakis, Antonios; Kotidis, Yannis; Roussopoulos, NickWe are inevitably moving into a realm where small and inexpensive wireless devices would be seamlessly embedded in the physical world and form a wireless sensor network in order to perform complex monitoring and computational tasks. Such networks pose new challenges in data processing and dissemination due to the conflict between (i) the abundance of information that can be collected and processed in a distributed fashion among thousands of nodes and (ii) the limited resources (bandwidth, energy) that such devices possess. In this paper we propose a new data reduction technique that exploits the correlation and redundancy among multiple measurements on the same sensor and achieves high degree of data reduction while managing to capture even the smallest details of the recorded measurements. The key to our technique is the base signal, a series of values extracted from the real measurements, used for encoding piece-wise linear correlations among the collected data values. We provide efficient algorithms for extracting the base signal features from the data and for encoding the measurements using these features. Our experiments demonstrate that our method by far outperforms standard approximation techniques like Wavelets, Histograms and the Discrete Cosine Transform, on a variety of error metrics and for real datasets from different domains. (UMIACS-TR-2003-80)Item Disseminating Updates to Mobile Clients(1998) Stathatos, Konstantinos; Roussopoulos, Nick; Baras, John S.; ISR; CSHCNIn this paper, we address the problem of propagating data updates to alarge number of mobile clients. Typically, mobile clients operateautonomously, i.e., disconnected from data servers, for prolongedperiods of time relying on locally replicated corporate data (e.g.,database views, file systems). From time to time, they need to refreshtheir replicas with data changes registered at a central datarepository. We propose a hybrid approach for delivering these updatesto the clients. We use a broadcast channel to "cache on the air"fresh updates for as long as they are high on demand. At the sametime, any requests for older updates are individually serviced by theserver through a separate channel. The air-caching satisfies the bulkof clients' needs, increasing data throughput many-fold compared totraditional data delivery mechanisms. We describe a hierarchicalair-cache structure, and analyze the performance of broadcasting a logof committed updates. Based on that, we propose a technique thatdynamically modifies the contents and the structure of the air-cacheaccording to the number and the (dis)connection habits of the clients.Through extensive simulations, we demonstrate the adaptiveness,efficiency, and practicality of the proposed system even for verylarge client populations.Item The Dwarf Data Cube Eliminates the Highy Dimensionality Curse(2003-12-18) Sismanis, Yannis; Roussopoulos, NickThe data cube operator encapsulates all possible groupings of a data set and has proved to be an invaluable tool in analyzing vast amounts of data. However its apparent exponential complexity has significantly limited its applicability to low dimensional datasets. Recently the idea of the dwarf data cube model was introduced, and showed that high-dimensional ``dwarf data cubes'' are orders of magnitudes smaller in size than the original data cubes even when they calculate and store every possible aggregation with 100\% precision. In this paper we present a surprising analytical result proving that the size of dwarf cubes grows polynomially with the dimensionality of the data set and, therefore, a full data cube at 100% precision is not inherently cursed by high dimensionality. This striking result of polynomial complexity reformulates the context of cube management and redefines most of the problems associated with data-warehousing and On-Line Analytical Processing. We also develop an efficient algorithm for estimating the size of dwarf data cubes before actually computing them. Finally, we complement our analytical approach with an experimental evaluation using real and synthetic data sets, and demonstrate our results. UMIACS-TR-2003-120Item Dwarf: Shrinking the PetaCube(2002-04-04) Sismanis, Yannis; Deligiannakis, Antonios; Roussopoulos, Nick; Kotidis, YannisDwarf is a highly compressed structure for computing, storing, and querying data cubes. Dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions. This paper describes the Dwarf structure and the Dwarf cube construction algorithm. Further optimizations are then introduced for improving clustering and query performance. Experiments with the current implementation include comparisons on detailed measurements with real and synthetic datasets against previously published techniques. The comparisons show that Dwarfs by far outperform these techniques on all counts: storage space, creation time, query response time, and updates of cubes. Also UMIACS-TR-2002-24Item Efficient Refreshment of Data Warehouse Views(1998-10-15) Baekgaard, Lars; Roussopoulos, NickA data warehouse is a view on a set of distributed and possible loosely coupled source databases. For efficiency reasons a warehouse should be maintained as a materialized view. Therefore, efficient incremental algorithms must be used to periodically refresh the data warehouse. It is possible and desirable to separate the process of warehouse refreshment from the process of warehouse use. In this paper we describe and compare view refreshment algorithms that are based on different combinations of materialized views, partially materialized views, and pointers. Our contribution is twofold. First, our algorithms and data structures are designed to minimize network communication and interactions between the warehouse and the source databases. The minimal set of data that is necessary for both warehouse refreshment and warehouse use is stored on the warehouse. Second, we describe the results of an experiment comparing these methods with respect to storage overhead and I/O. Briefly, the experiment show that algorithms based on a combination of partially materialized views and pointers outperforms algorithms based on materialized views. (Also cross-referenced as UMIACS-TR-96-33)Item Extended Wavelets for Multiple Measures(2003-04-04) Deligiannakis, Antonios; Roussopoulos, NickWhile work in recent years has demonstrated that wavelets can be efficiently used to compress large quantities of data and provide fast and fairly accurate answers to queries, little emphasis has been placed on using wavelets in approximating datasets containing multiple measures. Existing decomposition approaches will either operate on each measure individually, or treat all measures as a vector of values and process them simultaneously. We show in this paper that the resulting individual or combined storage approaches for the wavelet coefficients of different measures that stem from these existing algorithms may lead to suboptimal storage utilization, which results to reduced accuracy to queries. To alleviate this problem, we introduce in this work the notion of an extended wavelet coefficient as a flexible storage method for the wavelet coefficients, and propose novel algorithms for selecting which extended wavelet coefficients to retain under a given storage constraint. Experimental results with both real and synthetic datasets demonstrate that our approach achieves improved accuracy to queries when compared to existing techniques. UMIACS-TR-2003-32Item Hashing Moving Objects(2000-06-03) Song, Zhexuan; Roussopoulos, NickIn real-life applications, the objects are both spatial and temporal referenced. The objects which continuously change their location are called moving objects. With the development of wireless communication and positioning technology, it becomes necessary to store and index those objects in database. Due to the complexity of the problem, many pure spatial index structures are unable to index large volume of moving objects in database. In this paper, we propose a whole new idea based on hashing technique. Since it is impossible to re-index all the objects after each time period, we store the objects in buckets. When an object moves within a bucket, the database does not make any change. By using this technique, the number of database update is greatly reduced which makes the index procedure feasible. Then, we extend the previous system structure by introducing a filter layer between the position information collectors and the database. Also four different methods based on the new system structure are presented. Performance experiments were performed to evaluate different aspects of our indexing techniques, and the conclusions are included in the paper. Department of Computer Science, University of MarylandItem Hashing Technique: A New Index Method for High Dimensional Data(1999-09-25) Song, Zhexuan; Roussopoulos, NickWhen dimension goes high, sequential scan processing becomes more efficient than most index-based query. In this paper, we propose a new index method for high-dimensional data spaces. This method is based on hashing technique. The basic idea is: First find a hashing function which puts the given d-dimensional space data into a d'-dimensional buckets where d' << d. Then, we use existing index techniques to manage those buckets. We later define some properties of a good hashing function and give four hashing functions. To demonstrate the efficiency of our idea, we experimentally compared our algorithms with sequential scan and Pyramid-Techniques. The results demonstrate that this method outperforms others for skewed data set. It always beats the sequential scan by using only half of elapsed time for range query. However if the data has uniform distribution, Pyramid-Technique is still the best method.Item Hybrid Network Management(1996) Baras, John S.; Ball, Michael O.; Karne, Ramesh K.; Whitefield, David; Kelley, Stephen; Jang, Kap D.; Plaisant, Catherine; Roussopoulos, Nick; Stathatos, Kostas; Vakhutinsky, Andrrew; Valluri, Jaibharat; ISR; CSHCNWe describe our collaborative efforts towards the design and implementation of a next generation integrated network management system for hybrid network (INMS/HN). We describe the overall software architecture of the system at its current stage of development. This network management system if specifically designed to address issues relevant for complex heterogeneous networks consisting of seamlessly interoperable terrestrial and satellite networks. Network management systems are a key element for interoperability in such networks. We describe the integration of configuration management and performance management. The next step in this integration is fault management. In particular we describe the object model, issues of the Graphical User Interface (GUI), browsing tools and performance data graphical widget displays, management, information database (MIB) organization issues. Several components of the system are being commercialized by Hughes Networks Systems.- A revised version of this technical report has been published in
Proceedings of the AIAA: 16th International Communications Satellite Systems Conference and Exhibit, Part 1, pp. 490-500, Washington, D.C., February 25- 29, 1996.Item The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching(1998-10-15) Chen, Chungmin Melvin; Roussopoulos, NickIn this paper, we describe the design and evaluation of the ADMS optimizer. Capitalizing on a structure called Logical Access Path Schema to model the derivation relationship among cached query results, the optimizer is able to perform query matching coincidently with the optimization and generate more efficient query plans using cached results. The optimizer also features data caching and pointer caching, different cache replacement strategies, and different cache update strategies. An extensive set of experiments were conducted, and the results showed that pointer caching and dynamic cache update strategies substantially speedup query computations and, thus, increase query throughput under situations with fair query correlation and update load. The requirement of the cache space is relatively small and the extra computation overhead introduced by the caching and matching mechanism is more than offset by the time saved in query processing. (Also cross-referenced as UMIACS-TR-93-106)Item Managing File Subsystem Data Streams for Databases on Networked Systems(1996) Gupta, Sandeep K.; Baras, John S.; Kelley, Stephen; Roussopoulos, Nick; ISR; CSHCNOne important activity for networked database systems that distribute data across several workstations is moving data between the file and network subsystems. It is possible to create data streams in the operating system kernel. If provided on a system, they allow user level processes to request transfer of data without having t copy it into the user space. This is particularly useful for data whose content or format is not modified during the transfer. In this paper we present a conservative criterion for access and control for the management of such data streams for databases in a networked environment, and define the implementation requirements for achieving the criterion. The approach is to maintain at least the current level of access management. We define the specific implementation semantics that this criterion entails.