Tech Reports in Computer Science and Engineering

Permanent URI for this communityhttp://hdl.handle.net/1903/5

The technical reports collections in this community are deposited by the Library of the Computer Science department. If you have questions about these collections, please contact library staff at library@cs.umd.edu

Browse

Search Results

Now showing 1 - 10 of 56

Efficient Execution of Multi-Query Data Analysis Batches Using Compiler Optimization Strategies
(2003-08-01) Andrade, Henrique; Aryangat, Suresh; Kurc, Tahsin; Saltz, Joel; Sussman, Alan
This work investigates the leverage that can be obtained from compiler optimization techniques for efficient execution of multi-query workloads in data analysis applications. Our approach is to address multi-query optimization at the algorithmic level by transforming a declarative specification of scientific data analysis queries into a high-level imperative program that can be made more efficient by applying compiler optimization techniques. These techniques -- including loop fusion, common subexpression elimination and dead code elimination -- are employed to allow data and computation reuse across queries. We describe a preliminary experimental analysis on a real remote sensing application that is used to analyze very large quantities of satellite data. The results show our techniques achieve sizable reduction in the amount of computation and I/O necessary for executing query batches and in average executing times for the individual queries in a given batch. (UMIACS-TR-2003-76)
The Virtual Microscope
(2002-10-07) Catalyurek, Umit; Beynon, Michael D.; Chang, Chialin; Kurc, Tahsin; Sussman, Alan; Saltz, Joel
We present the design and implementation of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. The system provides a form of completely digital telepathology, allowing simultaneous access to archived digital slide images by multiple clients. The main problem the system targets is storing and processing the extremely large quantities of data required to represent a collection of slides. The Virtual Microscope client software runs on the end user's PC or workstation, while database software for storing, retrieving and processing the microscope image data runs on a parallel computer or on a set of workstations at one or more potentially remote sites. We have designed and implemented two versions of the data server software. One implementation is a customization of a database system framework that is optimized for a tightly coupled parallel machine with attached local disks. The second implementation is component-based, and has been designed to accommodate access to and processing of data in a distributed, heterogeneous environment. We also have developed caching client software, implemented in Java, to achieve good response time and portability across different computer platforms. The performance results presented show that the Virtual Microscope systems scales well, so that many clients can be adequately serviced by an appropriately configured data server. (Also UMIACS-TR-2002-85)
Exploiting Functional Decomposition for Efficient Parallel Processing of Multiple Data Analysis Queries
(2002-10-25) Andrade, Henrique; Kurc, Tahsin; Sussman, Alan; Saltz, Joel
Reuse is a powerful method for improving system performance. In this paper, we examine functional decomposition for improving data reuse, and therefore overall query execution performance, in the context of data analysis applications. Additionally, we look at the performance effects of using various projection primitives that make it possible to transform intermediate results generated during the execution of a previous query so that they can be reused by a new query. A satellite data analysis application is used to experimentally show the performance benefits achieved using the techniques presented in the paper. UMIACS-TR-2002-84
Active Proxy-G: Optimizing the Query Execution Process in the Grid
(2002-05-22) Andrade, Henrique; Kurc, Tahsin; Sussman, Alan; Saltz, Joel
The Grid environment facilitates collaborative work and allows many users to query and process data over geographically dispersed data repositories. Over the past several years, there has been a growing interest in developing applications that interactively analyze datasets, potentially in a collaborative setting. We describe an Active Proxy-G service that is able to cache query results, use those results for answering new incoming queries, generate subqueries for the parts of a query that cannot be produced from the cache, and submit the subqueries for final processing at application servers that store the raw datasets. We present an experimental evaluation to illustrate the effects of various design tradeoffs. We also show the benefits that two real applications gain from using the middleware. (Also UMIACS-TR-2002-41)
Servicing Mixed Data Intensive Query Workloads
(2002-02-25) Andrade, Henrique; Kurc, Tahsin; Sussman, Alan; Borovikov, Eugene; Saltz, Joel
When data analysis applications are employed in a multi-client environment, a data server must service multiple simultaneous queries, each of which may employ complex user-defined data structures and operations on the data. It is then necessary to harness inter- and intra-query commonalities and system resources to improve the performance of the data server. We have developed a framework and customizable middleware to enable reuse of intermediate and final results among queries, through an in-memory semantic cache and user-defined transformation functions. Since resources such as processing power and memory space are limited on the machine hosting the server, effective scheduling of incoming queries and efficient cache replacement policies are challenging issues that must be addressed. We have addressed the scheduling problem in earlier work, and in this paper we describe and evaluate several cache replacement policies. We present experimental evaluation of the policies on a shared-memory parallel system using two applications from different domains. Also UMIACS-TR-2002-21
Multiple Query Optimization For Data Analysis Applications on Clusters of SMPs
(2001-11-21) Andrade, Henrique; Kurc, Tahsin; Sussman, Alan; Saltz, Joel
This paper is concerned with the efficient execution of multiple query workloads on a cluster of SMPs. We target applications that access and manipulate large scientific datasets. Queries in these applications involve user-defined processing operations on data and distributed data structures to hold intermediate and final results. Our goal is to implement system components to leverage previously computed query results and to effectively utilize processing power and aggregated I/O bandwidth on SMP nodes so that both single queries and multi-query batches can be efficiently executed. (Also referenced as UMIACS-TR-2001-78)
Scheduling Multiple Data Visualization Query Workloads on a Shared Memory Machine
(2001-10-10) Andrade, Henrique; Kurc, Tahsin; Sussman, Alan; Saltz, Joel
Query scheduling plays an important role when systems are faced with limited resources and high workloads. It becomes even more relevant for servers applying multiple query optimization techniques to batches of queries, in which portions of datasets as well as intermediate results are maintained in memory to speed up query evaluation. In this work, we present a dynamic query scheduling model based on a priority queue implementation using a directed graph and a strategy for ranking queries. We examine the relative performance of four ranking strategies and compare them against a first-in first-out (FIFO) scheduling strategy. We describe experimental results on a shared-memory machine using two different versions of an application, called the Virtual Microscope, for browsing digitized microscopy images. (Also cross-referenced UMIACS-TR-2001-68)
Compiler Supported High-level Abstractions for Sparse Disk-Resident Datasets
(2001-09-05) Ferreira, Renato; Agrawal, Gagan; Saltz, Joel
Processing and analysing large volumes of data plays an increasingly important role in many domains of scientific research. The complexity and irregularity of datasets in many domains make the task of developing such processing applications tedious and error-prone. We propose use of high-level abstractions for hiding the irregularities in these datasets and enabling rapid development of correct, but not necessarily efficient, data processing applications. We present two execution strategies and a set of compiler analysis techniques for obtaining high performance from applications written using our proposed high-level abstractions. Our execution strategies achieve high locality in disk accesses. Once a disk block is read from the disk, all iterations that read any of the elements from this disk block are performed. To support our execution strategies and improve the performance, we have developed static analysis techniques for: 1) computing the set of iterations that access a particular righ-hand-side element, 2) generating a function that can be applied to the meta-data associated with each disk block, for determining if that disk block needs to be read, and 3) performing code hoisting of conditionals. Cross-referenced as UMIACS-TR-2001-50
Efficient Execution of Multiple Query Workloads in Data Analysis Applications
(2001-05-10) Andrade, Henrique; Kurc, Tahsin; Sussman, Alan; Saltz, Joel
Applications that analyze, mine, and visualize large datasets is considered an important class of applications in many areas of science, engineering and business. Queries commonly executed in data analysis applications often involve user-defined processing of data and application-specific data structures. If data analysis is employed in a collaborative environment, the data server should execute multiple such queries simultaneously to minimize the response time to the clients of the data analysis application. In a multi-client environment, there may be a large number of overlapping regions of interest and common processing requirements among the clients. Thus, better performance can be achieved if commonalities among multiple queries can be exploited. In this paper we present the design of a runtime system for executing multiple query workloads on a shared-memory machine. We describe initial experimental results using an application for browsing digitized microscopy images. (Cross-referenced as UMIACS-TR-2001-35)
A Component-based Implementation of Iso-surface Rendering for Visualizing Large Datasets
(2001-05-10) Beynon, Michael D.; Kurc, Tahsin; Catalyurek, Umit; Sussman, Alan; Sussman, Alan; Saltz, Joel
Isosurface rendering is a technique for extracting and visualizing surfaces within a 3D volume. It is a widely used visualization method in many application areas. In this paper, we describe a component-based implementation of isosurface rendering for visualizing very large datasets in a distributed, heterogeneous environment. We use DataCutter, a component framework that supports subsetting and user-defined processing of large multi-dimensional datasets in a distributed environment. We present experimental results on a heterogeneous collection of multiprocessor machines. (Cross-referenced as UMIACS-TR-2001-34)

Tech Reports in Computer Science and Engineering

Browse

Filters

Settings

Sort By

Results per page

Search Results