Browsing by Author "Uysal, Mustafa"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item A Customizable Simulator for Workstation Networks(1998-10-15) Uysal, Mustafa; Acharya, Anurag; Bennett, Robert; Saltz, JoelWe present a customizable simulator called netsim for high-performance point-to-point workstation networks that is accurate enough to be used for application-level performance analysis yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16-node IBM SP-2 with a multistage network and a 10-node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and a 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm. In addition, we show that the cross-traffic congestion for today's high-speed point-to-point networks has little, if any, effect on application-level performance and that modeling end-point congestion is sufficient for a reasonably accurate simulation. (Also cross-referenced as UMIACS-TR-96-68)Item An Evaluation of Architectural Alternatives for Rapidly Growing Datasets, Active Disks, Clusters, SMPs(1998-12-08) Uysal, Mustafa; Acharya, Anurag; Saltz, JoelGrowth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68)Item An Evaluation of Architectural Alternatives for Rapidly Growing Datasets, Active Disks, Clusters, SMPs(1998-12-08) Uysal, Mustafa; Acharya, Anurag; Saltz, JoelGrowth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68)Item Index Translation Schemes for Adaptive Computations on Distributed Memory Multicomputers(1998-10-15) Moon, Bongki; Uysal, Mustafa; Saltz, JoelCurrent research in parallel programming is focused on closing the gap between globally indexed algorithms and the separate address spaces of processors on distributed memory multicomputers. A set of index translation schemes have been implemented as a part of CHAOS runtime support library, so that the library functions can be used for implementing a global indez space across a collection of separate local index spaces. These schemes include also software-cached translation schemes aimed at adaptive irregular problems as teen as a distributed translation table technique for statically irregular problems. To evaluate and demonstrate the efficiency of the softwDare-cached translation schemes, experiments have been performed with an adaptively irregular loop kernel and a full-fledped 3D DSMC code from NASA Langley on the Intel Paragon and Cray T3D. This paper also discusses and analyzes the operational conditions under which each scheme can produce optimal performance. (Also cross-referenced as UMIACS-TR-95-28)Item A Manual for the CHAOS Runtime Library(1998-10-15) Saltz, Joel; Ponnusamy, Ravi; Sharma, Shamik D.; Moon, Bongki; Hwang, Yuan-Shin; Uysal, Mustafa; Das, RajaProcedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial dif- ferential equations solvers) on distributed memory machines. These procedures are also designed for use in compilers for distributed memory multiprocessors. The portable CHAOS pro- cedures are designed to support dynamic data distributions and to automatically generate send and receive messsage by capturing communications patterns at runtime. (Also cross-referenced as UMIACS-TR-95-34)Item A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines(1998-10-15) Uysal, Mustafa; Kurc, Tahsin M.; Sussman, Alan; Saltz, JoelThis paper presents a simulation-based performance prediction framework for large scale data-intensive applications on large scale machines. Our framework consists of two components: application emulators and a suite of simulators. Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering, processing structure, etc.) easily and flexibly. Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine configurations. The key to efficient simulation of very large scale configurations is a technique called loosely-coupled simulation where the processing structure of the application is embedded in the simulator, while preserving data dependencies and data distributions. We evaluate our performance prediction tool using a set of three data-intensive applications. (Also cross-referenced as UMIACS TR # 98-39)Item Requirements of I/O Systems for Parallel Machines: An Application-driven Study(1998-10-15) Uysal, Mustafa; Acharya, Anurag; Saltz, JoelI/O-intensive parallel programs have emerged as one of the leading consumers of cycles on parallel machines. This change has been driven by two trends. First, parallel scientific applications are being used to process larger datasets that do not fit in memory. Second, a large number of parallel machines are being used for non-scientific applications. Efficient execution of these applications requires high-performance I/O systems which have been designed to meet their I/O requirements. In this paper, we examine the I/O requirements for data-intensive parallel applications and the implications of these requirements for the design of I/O systems for parallel machines. We attempt to answer the following questions. First, what is the steady-state as well peak I/O rate required? Second, what spatial patterns, if any, occur in the sequence of I/O requests for individual applications? Third, what is the degree of intra-processor and inter-processor locality in I/O accesses? Fourth, does the application structure allow programmers to disclose future I/O requests to the I/O system? Fifth, what patterns, if any, exist in the sequence of inter-arrival times of I/O requests? To address these questions, we have analyzed I/O request traces for a diverse set of I/O-intensive parallel applications. This set includes seven scientific applications and four non-scientific applications. (Also cross-referenced as UMIACS-TR-97-49)