Browsing by Author "Deligiannakis, Antonios"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Accurate Data Approximation in Constrained Environments(2005-06-15) Deligiannakis, Antonios; Roussopoulos, Nick; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Several data reduction techniques have been proposed recently as methods for providing fast and fairly accurate answers to complex queries over large quantities of data. Their use has been widespread, due to the multiple benefits that they may offer in several constrained environments and applications. Compressed data representations require less space to store, less bandwidth to communicate and can provide, due to their size, very fast response times to queries. Sensor networks represent a typical constrained environment, due to the limited processing, storage and battery capabilities of the sensor nodes. Large-scale sensor networks require tight data handling and data dissemination techniques. Transmitting a full-resolution data feed from each sensor back to the base-station is often prohibitive due to (i) limited bandwidth that may not be sufficient to sustain a continuous feed from all sensors and (ii) increased power consumption due to the wireless multi-hop communication. In order to minimize the volume of the transmitted data, we can apply two well data reduction techniques: aggregation and approximation. In this dissertation we propose novel data reduction techniques for the transmission of measurements collected in sensor network environments. We first study the problem of summarizing multi-valued data feeds generated at a single sensor node, a step necessary for the transmission of large amounts of historical information collected at the node. The transmission of these measurements may either be periodic (i.e., when a certain amount of measurements has been collected), or in response to a query from the base station. We then also consider the approximate evaluation of aggregate continuous queries. A continuous query is a query that runs continuously until explicitly terminated by the user. These queries can be used to obtain a live-estimate of some (aggregated) quantity, such as the total number of moving objects detected by the sensors.Item Data Reduction Techniques for Sensor Networks(2003-08-01) Deligiannakis, Antonios; Kotidis, Yannis; Roussopoulos, NickWe are inevitably moving into a realm where small and inexpensive wireless devices would be seamlessly embedded in the physical world and form a wireless sensor network in order to perform complex monitoring and computational tasks. Such networks pose new challenges in data processing and dissemination due to the conflict between (i) the abundance of information that can be collected and processed in a distributed fashion among thousands of nodes and (ii) the limited resources (bandwidth, energy) that such devices possess. In this paper we propose a new data reduction technique that exploits the correlation and redundancy among multiple measurements on the same sensor and achieves high degree of data reduction while managing to capture even the smallest details of the recorded measurements. The key to our technique is the base signal, a series of values extracted from the real measurements, used for encoding piece-wise linear correlations among the collected data values. We provide efficient algorithms for extracting the base signal features from the data and for encoding the measurements using these features. Our experiments demonstrate that our method by far outperforms standard approximation techniques like Wavelets, Histograms and the Discrete Cosine Transform, on a variety of error metrics and for real datasets from different domains. (UMIACS-TR-2003-80)Item Dwarf: Shrinking the PetaCube(2002-04-04) Sismanis, Yannis; Deligiannakis, Antonios; Roussopoulos, Nick; Kotidis, YannisDwarf is a highly compressed structure for computing, storing, and querying data cubes. Dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions. This paper describes the Dwarf structure and the Dwarf cube construction algorithm. Further optimizations are then introduced for improving clustering and query performance. Experiments with the current implementation include comparisons on detailed measurements with real and synthetic datasets against previously published techniques. The comparisons show that Dwarfs by far outperform these techniques on all counts: storage space, creation time, query response time, and updates of cubes. Also UMIACS-TR-2002-24Item Extended Wavelets for Multiple Measures(2003-04-04) Deligiannakis, Antonios; Roussopoulos, NickWhile work in recent years has demonstrated that wavelets can be efficiently used to compress large quantities of data and provide fast and fairly accurate answers to queries, little emphasis has been placed on using wavelets in approximating datasets containing multiple measures. Existing decomposition approaches will either operate on each measure individually, or treat all measures as a vector of values and process them simultaneously. We show in this paper that the resulting individual or combined storage approaches for the wavelet coefficients of different measures that stem from these existing algorithms may lead to suboptimal storage utilization, which results to reduced accuracy to queries. To alleviate this problem, we introduce in this work the notion of an extended wavelet coefficient as a flexible storage method for the wavelet coefficients, and propose novel algorithms for selecting which extended wavelet coefficients to retain under a given storage constraint. Experimental results with both real and synthetic datasets demonstrate that our approach achieves improved accuracy to queries when compared to existing techniques. UMIACS-TR-2003-32