Tech Reports in Computer Science and Engineering

Permanent URI for this communityhttp://hdl.handle.net/1903/5

The technical reports collections in this community are deposited by the Library of the Computer Science department. If you have questions about these collections, please contact library staff at library@cs.umd.edu

Browse

Now showing 1 - 15 of 15

Compiler-directed Dynamic Linking for Mobile Programs
(1998-10-15) Acharya, Anurag; Saltz, Joel
In this paper, we present a compiler-directed technique for safe dynamic linking for mobile programs. Our technique guarantees that linking failures can occur only when a program arrives at a new execution site and that this failure can be delivered to the program as an error code or an exception. We use interprocedural analysis to identify the set of names that must be linked at the different sites the program executes on. We use a combination of runtime and compile-time techniques to identify the calling context and to link only the names needed in that context. Our technique is able to handle recursive programs as well as separately compiled code that may itself be able to move. We discuss language constructs for controlling the behavior of dynamic linking and the implication of some of these constructs for application structure. (Also cross-referenced as UMIACS-TR-96-81)
A Customizable Simulator for Workstation Networks
(1998-10-15) Uysal, Mustafa; Acharya, Anurag; Bennett, Robert; Saltz, Joel
We present a customizable simulator called netsim for high-performance point-to-point workstation networks that is accurate enough to be used for application-level performance analysis yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16-node IBM SP-2 with a multistage network and a 10-node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and a 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm. In addition, we show that the cross-traffic congestion for today's high-speed point-to-point networks has little, if any, effect on application-level performance and that modeling end-point congestion is sufficient for a reasonably accurate simulation. (Also cross-referenced as UMIACS-TR-96-68)
Deferred Data-Flow Analysis : Algorithms, Proofs and Applications
(1998-11-03) Sharma, Shamik D.; Acharya, Anurag; Saltz, Joel
Loss of precision due to the conservative nature of compile-time dataflow analysis is a general problem and impacts a wide variety of optimizations. We propose a limited form of runtime dataflow analysis, called deferred dataflow analysis (DDFA), which attempts to sharpen dataflow results by using control-flow information that is available at runtime. The overheads of runtime analysis are minimized by performing the bulk of the analysis at compile-time and deferring only a summarized version of the dataflow problem to runtime. Caching and reusing of dataflow results reduces these overheads further. DDFA is an interprocedural framework and can handle arbitrary control structures including multi-way forks, recursion, separately compiled functions and higher-order functions. It is primarily targeted towards optimization of heavy-weight operations such as communication calls, where one can expect significant benefits from sharper dataflow analysis. We outline how DDFA can be used to optimize different kinds of heavy-weight operations such as bulk-prefetching on distributed systems and dynamic linking in mobile programs. We prove that DDFA is safe and that it yields better dataflow information than strictly compile-time dataflow analysis. (Also cross-referenced as UMIACS-TR-98-46)
An Evaluation of Architectural Alternatives for Rapidly Growing Datasets, Active Disks, Clusters, SMPs
(1998-12-08) Uysal, Mustafa; Acharya, Anurag; Saltz, Joel
Growth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68)
An Evaluation of Architectural Alternatives for Rapidly Growing Datasets, Active Disks, Clusters, SMPs
(1998-12-08) Uysal, Mustafa; Acharya, Anurag; Saltz, Joel
Growth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68)
An Interprocedural Framework for Placement of Asychronous I/O Operations
(1998-10-15) Agrawal, Gagan; Acharya, Anurag; Saltz, Joel
Overlapping memory accesses with computations is a standard technique for improving performance on modern architectures, which have deep memory hierarchies. In this paper, we present a compiler technique for overlapping accesses to secondary memory (disks) with computation. We have developed an Interprocedural Balanced Code Placement (IBCP) framework, which performs analysis on arbitrary recursive procedures and arbitrary control flow and replaces synchronous I/O operations with a balanced pair of asynchronous operations. We demonstrate how this analysis is useful for applications which perform frequent and large accesses to secondary memory, including applications which snapshot or checkpoint their computations or out-of-core applications. (Also cross-referenced as UMIACS-TR-95-114)
Mobile Streams
(1998-10-15) Ranganathan, M.; Acharya, Anurag; Andrey, Laurent; Schaal, Virginie; Saltz, Joel
A large class of distributed testing, control and collaborative applications are reactive or event driven in nature. Such applications can be structured as a set of handlers that react to events and that in turn can trigger other events. We have developed an application building toolkit that facilitates development of such applications. Our system is based on the concept of Mobile Streams. Applications developed in our system are dynamically extensible and re-configurable and our system provides the application designer a means to control how the system can be extended and reconfigured. We describe our system model and implementation and compare our design to the design of other systems. (Also cross-referenced as UMIACS-TR-98-36)
Network-Aware Mobile Programs
(1998-10-15) Ranganathan, M.; Acharya, Anurag; Sharma, Shamik D.; Saltz, Joel
In this paper, we investigate network-aware mobile programs, programs that can use mobility as a tool to adapt to variations in network characteristics. We present infrastructural support for mobility and network monitoring and show how adaptalk, a Java-based mobile Internet chat application can take advantage of this support to dynamically place the chat server so as to minimize response time. Our conclusion was that on-line network monitoring and adaptive placement of shared data-structures can significantly improve performance of distributed applications on the Internet. (Also cross-referenced as UMIACS-TR-96-46)
Requirements of I/O Systems for Parallel Machines: An Application-driven Study
(1998-10-15) Uysal, Mustafa; Acharya, Anurag; Saltz, Joel
I/O-intensive parallel programs have emerged as one of the leading consumers of cycles on parallel machines. This change has been driven by two trends. First, parallel scientific applications are being used to process larger datasets that do not fit in memory. Second, a large number of parallel machines are being used for non-scientific applications. Efficient execution of these applications requires high-performance I/O systems which have been designed to meet their I/O requirements. In this paper, we examine the I/O requirements for data-intensive parallel applications and the implications of these requirements for the design of I/O systems for parallel machines. We attempt to answer the following questions. First, what is the steady-state as well peak I/O rate required? Second, what spatial patterns, if any, occur in the sequence of I/O requests for individual applications? Third, what is the degree of intra-processor and inter-processor locality in I/O accesses? Fourth, does the application structure allow programmers to disclose future I/O requests to the I/O system? Fifth, what patterns, if any, exist in the sequence of inter-arrival times of I/O requests? To address these questions, we have analyzed I/O request traces for a diverse set of I/O-intensive parallel applications. This set includes seven scientific applications and four non-scientific applications. (Also cross-referenced as UMIACS-TR-97-49)
Runtime Coupling of Data-parallel Programs
(1998-10-15) Ranganathan, M.; Acharya, Anurag; Edjlali, Guy; Sussman, Alan; Saltz, Joel
We consider the problem of efficiently coupling multiple data-parallel programs at runtime. We propose an approach that establishes a mapping between data structures in different data-parallel programs and implements a user specified consistency model. Mappings are established at runtime and new mappings between programs can be added and deleted while the programs are in execution. Mappings, or the identity of the processors involved, do not have to be known at compile-time or even link-time. Programs can be made to interact with different granularities of interaction without requiring any re-coding. A priori knowledge of data movement requirements allows for buffering of data and overlap of computations between coupled applications. Efficient data movement is achieved by pre-computing an optimized schedule. We describe our prototype implementation and evaluate its performance for a set of synthetic benchmarks that examine the variation of performance with coupling parameters. We demonstrate that the cost of the added flexibility gained by our coupling method is not prohibitive when compared with a monolithic code that does the same computation. (Also cross-referenced as UMIACS-TR-95-116)
A Study of Internet Round-Trip Delay
(1998-10-15) Acharya, Anurag; Saltz, Joel
We present the results of a study of Internet round-trip delay. The links chosen include links to frequently accessed commercial hosts as well as well-known academic and foreign hosts. Each link was studied for a 48-hour period. We attempt to answer the following questions: (1) how rapidly and in what manner does the delay change -- in this study, we focus on medium-grain (seconds/minutes) and coarse-grain time-scales (tens of minutes/hours); (2) what does the frequency distribution of delay look like and how rapidly does it change; (3) what is a good metric to characterize the delay for the purpose of adaptation. Our conclusions are: (a) there is large temporal and spatial variation in round-trip time (RTT); (b) RTT distribution is usually unimodal and asymmetric and has a long tail on the right hand side; (c) RTT observations in most time periods are tightly clustered around the mode; (d) the mode is a good characteristic value for RTT distributions; (e) RTT distributions change slowly; (f) persistent changes in RTT occur slowly, sharp changes are undone very shortly; (g) jitter in RTT observations is small and (h) inherent RTT occurs frequently. (Also cross-referenced as UMIACS-TR-96-97)
Study of Scalable Declustering Algorithms for Parallel Grid Files
(1998-10-15) Moon, Bongki; Acharya, Anurag; Saltz, Joel
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called {\em minimax} which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution. (Also cross-referenced as UMIACS-TR-96-4)
T2: A Customizable Parallel Database For Multi-dimensional Data
(1998-10-15) Chang, Chialin; Acharya, Anurag; Sussman, Alan; Saltz, Joel
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. (Also cross-referenced as UMIACS-TR-98-04)
Titan A High-Performance Remote-Sensing Database
(1998-10-15) Chang, Chialin; Moon, Bongki; Acharya, Anurag; Shock, Carter; Sussman, Alan; Saltz, Joel
There are two major challenges for a high-performance remote-sensing database. First, it must provide low-latency retrieval of very large volumes of spatio-temporal data. This requires effective declustering and placement of a multi-dimensional dataset onto a large disk farm. Second, the order of magnitude reduction in data-size due to post-processing makes it imperative, from a performance perspective, that the postprocessing be done on the machine that holds the data. This requires careful coordination of computation and data retrieval. This paper describes the design, implementation and evaluation of {\em Titan}, a parallel shared-nothing database designed for handling remote-sensing data. The computational platform for Titan is a 16-processor IBM SP-2 with four fast disks attached to each processor. Titan is currently operational and contains about 24~GB of data from the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7 satellite. The experimental results show that Titan provides good performance for global queries, and interactive response times for local queries. (Also cross-referenced as UMIACS-TR-96-67)
The Utility of Exploiting Idle Workstations for Parallel Computation
(1998-10-15) Acharya, Anurag; Edjlali, Guy; Saltz, Joel
In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of $k$ workstations available? This provides an estimate of the opportunity for parallel computation. Second, how stable is a cluster of free machines and how does the stability vary with the size of the cluster? This indicates how frequently a parallel computation might have to stop for adapting to changes in processor availability. Third, what is the distribution of workstation idle-times? This information is useful for selecting workstations to place computation on. Fourth, how much benefit can a user expect? To state this in concrete terms, if I have a pool of size S, how big a parallel machine should I expect to get for free by harvesting idle machines. Finally, how much benefit can be achieved on a real machine and how hard does a parallel programmer have to work to make this happen? To answer the workstation-availability questions, we have analyzed 14-day traces from three workstation pools. To determine the equivalent parallel machine, we have simulated the execution of a group of well-known parallel programs on these workstation pools. To gain an understanding of the practical problems, we have developed the system support required for adaptive parallel programs as well as an adaptive parallel CFD application. (Also cross-referenced as UMIACS-TR-96-80)