Browsing by Author "Pugh, William"
Now showing 1 - 20 of 23
Results Per Page
Sort Options
Item Code Generation for Multiple Mappings(1998-10-15) Kelly, Wayne; Pugh, William; Rosser, EvanThere has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the points in these new iteration spaces in the appropriate order. This problem has been fairly well-studied in the case where all statements use the same mapping. We have developed an algorithm for the less well-studied case where each statement uses a potentially different mapping. Unlike many other approaches, our algorithm can also generate code from mappings corresponding to loop blocking. We address the important trade-off between reducing control overhead and duplicating code. (Also cross-referenced as UMIACS-TR-94-87.1)Item Concurrent Maintenance of Skip Lists(1998-10-15) Pugh, WilliamThis papers describes a new approach to providing efficient concurrent access to a dynamic search structure. Previous approaches have attempted to solve this problem using search trees (either balanced or unbalanced). We describe methods for performing concurrent access and updates using skip lists. Skip lists are a probabilistic alternative to balanced trees that provide much of the simplicity of unbalanced trees, together with good worst-case expected performance. In this paper, we briefly review skip lists, describe simple methods for concurrent maintenance of sorted linked lists, formally prove the correctness of these methods, and show how they can be extended to provide simple and efficient algorithms for concurrent maintenance of skip lists. (Also cross-referenced as UMIACS-TR-90-80)Item Counting Solutions to Presburger Formulas: How and Why(1998-10-15) Pugh, WilliamWe describe methods that are able to count the number of integer solutions to selected free variables of a Presburger formula, or sum a polynomial over all integer solutions of selected free variables of a Presburger formula. This answer is given symbolically, in terms of symbolic constants (the remaining free variables in the Presburger formula). For example, we can create a Presburger formula who's solutions correspond to the iterations of a loop. By counting these, we obtain an estimate of the execution time of the loop. In more complicated applications, we can create Presburger formulas who's solutions correspond to the distinct memory locations or cache lines touched by a loop, the flops executed by a loop, or the array elements that need to be communicated at a particular point in a distributed computation. By counting the number of solutions, we can evaluate the computation/memory balance of a computation, determine if a loop is load balanced and evaluate message traffic and allocate message buffers. (Also cross-referenced as UMIACS-TR-94-27)Item Definitions of Dependence Distance(1998-10-15) Pugh, WilliamData dependence distance is widely used to characterize data dependences in advanced optimizing compilers. The standard definition of dependence distance assumes that loops are normalized (have constant lower bounds and a step of 1); there is not a commonly accepted definition for unnormalized loops. We have identified several potential definitions, all of which give the same answer for normalized loops. There are a number of subtleties involved in choosing between these definitions, and no one definition is suitable for all applications. (Also cross-referenced as UMIACS-TR-93-133)Item Determing Schedules based on Performance Estimation(1998-10-15) Kelly, Wayne; Pugh, WilliamIn previous work, we presented a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, loop skewing and statement reordering. The framework provides a uniform way to represent and reason about transformations. However, it does not provide a way to decide which transformation(s) should be applied to a given program. This paper describes a way to make such decisions within the context of the framework. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. We show how we can estimate the performance of a program by considering only the schedule from which it was produced. We also show how to produce an upper bound on performance given only a partially specified schedule. Our ability to estimate performance directly from schedules and to do so even for partially specified schedules allows us to efficiently find schedules which will produce good code. (Also cross-referenced as UMIACS-TR-93-67)Item Eliminating False Data Dependences using the Omega Test(1998-10-15) Pugh, William; Wonnacott, DavidArray data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we really should be asking go beyond integer programming and require decision procedures for a subclass of Presburger formulas. In this paper, we describe how to extend the Omega test so that it can answer these queries and allow us to eliminate these false data dependences. We have implemented the techniques described here and believe they are suitable for use in production compilers. (An earlier version of this paper appeared at the ACM SIGPLAN PLDI'92 conference). (Also cross-referenced as UMIACS-TR-93-132)Item An Exact Method for Analysis of Value-based Array Data Dependences(1998-10-15) Pugh, William; Wonnacott, DavidStandard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two references to the same memory location a memory-based dependence. In contrast, if there are no intervening writes, the references touch the same value and we call the dependence a value-based dependence. There has been a surge of recent work on value-based array data dependence analysis (also referred to as computation of array data-flow dependence information). In this paper, we describe a technique that is exact over programs without control flow (other than loops) and non-linear references. We compare our proposal with the technique proposed by Paul Feautrier, which is the other technique that is complete over the same domain as ours. We also compare our work with that of Tu and Padua, a representative approximate scheme for array privatization. (Also cross-referenced as UMIACS-TR-93-137)Item Experiences with Constraint-based Array Dependence Analysis(1998-10-15) Pugh, William; Wonnacott, DavidArray data dependence analysis provides important information for optimization of scientific programs. Array dependence testing can be viewed as constraint analysis, although traditionally general-purpose constraint manipulation algorithms have been thought to be too slow for dependence analysis. We have explored the use of exact constraint analysis, based on Fourier's method, for array data dependence analysis. We have found these techniques can be used without a great impact on total compile time. Furthermore, the use of general-purpose algorithms has allowed us to address problems beyond traditional dependence analysis. In this paper, we summarize some of the constraint manipulation techniques we use for dependence analysis, and discuss some of the reasons for our performance results. (Also cross-referenced as UMIACS-TR-94-122)Item Exploiting Monotone Convergence Functions in Parallel Programs(1998-10-15) Pugh, William; Rosser, Evan; Shpeisman, TatianaScientific codes which use iterative methods are often difficult to parallelize well. Such codes usually contain \code{while} loops which iterate until they converge upon the solution. Problems arise since the number of iterations cannot be determined at compile time, and tests for termination usually require a global reduction and an associated barrier. We present a method which allows us avoid performing global barriers and exploit pipelined parallelism when processors can detect non-convergence from local information. (Also cross-referenced as UMIACS-TR-96-31.1)Item Finding Legal Reordering Transformations using Mappings(1998-10-15) Kelly, Wayne; Pugh, WilliamTraditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it hard to analyze or predict the effects of compositions of these transformations. To overcome these problems we have developed a framework for unifying iteration reordering transformations. The framework is based on the idea that all reordering transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. An optimizing compiler would use our framework by finding a mapping that both corresponds to a legal transformation and produces efficient code. We present the mapping selection problem as a search problem by decomposing it into a sequence of smaller choices. We then characterize the set of all legal mappings by defining an implicit search tree. (Also cross-referenced as UMIACS-TR-94-71)Item A Framework for Unifying Reordering Transformations(1998-10-15) Kelly, Wayne; Pugh, WilliamWe present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. As part of the framework, we provide algorithms to assist in the building and use of schedules. In particular, we provide algorithms to test the legality of schedules, to align schedules and to generate optimized code for schedules. (Also cross-referenced as UMIACS-TR-93-134)Item Identifying Reordering Transformations That Minimize Idle Processor Time(1998-10-15) Kelly, Wayne; Pugh, WilliamItem Iteration Space Slicing and Its Application to Communication Optimization(1998-10-15) Pugh, William; Rosser, EvanProgram slicing is an analysis that answers questions such as ``Which statements might affect the computation of variable $v$ at statement $s$?'' or ``Which statements depend on the value of $v$ computed in statement $s$?''. The answers computed by program slicing are generally a set of statements. We introduce the idea of {\em iteration spacing slicing}: we refine program slicing to ask questions such as ``Which iterations of which statements might effect the computation in iterations $I$ of statement $s$?'' or ``Which iterations of which statements depend on the value computed by iterations $I$ of statement $s$?''. One application of this general-purpose technique is optimization of interprocessor communication in data-parallel compilers. For example, we can separate a code fragment into 1) those iterations that must be done before a send, 2) those iterations that don't need to be done before a send and don't depend on non-local data and 3), those iterations that depend on non-local data. We examine applications of iteration space slicing to communication optimizations in parallel executions of programs such as stencil computations and block-cyclic Gaussian elimination with partial pivoting. (Also cross-referenced as UMIACS-TR-97-02)Item Minimizing Communication while Preserving Parallelism(1998-10-15) Kelly, Wayne; Pugh, WilliamTo compile programs for message passing architectures and to obtain good performance on NUMA architectures it is necessary to control how computations and data are mapped to processors. Languages such as High-Performance Fortran use data distributions supplied by the programmer and the owner computes rule to specify this. However, the best data and computation decomposition may differ from machine to machine and require substantial expertise to determine. Therefore, automated decomposition is desirable. All existing methods for automated data/computation decomposition share a common failing: they are very sensitive to the original loop structure of the program. While they find a good decomposition for that loop structure, it may be possible to apply transformations (such as loop interchange and distribution) so that a different decomposition gives even better results. We have developed automatic data/computation decomposition methods that are not sensitive to the original program structure. We can model static and dynamic data decompositions as well as computation decompositions that cannot be represented by data decompositions and the owner computes rule. We make use of both parallel loops and doacross/pipelined loops to exploit parallelism. We describe an automated translation of the decomposition problem into a weighted graph that incorporates estimates of both parallelism and communication for various candidate computation decompositions. We solve the resulting search problem exactly in a very short time using a new algorithm that has shown to be able to prune away a majority of the vast search space. We assume that the selection of the computation decomposition is followed by a transformation phase that reorders the iterations to best match the selected computation decomposition. Our graph includes constraints to ensure that a reordering transformation giving the predicted parallelism exists. (Also cross-referenced as UMIACS-TR-95-118)Item Model Checking Concurrent Systems with Unbounded Integer Variables: Symbolic Representations, Approximations and Experimental Results(1998-10-15) Bultan, Tevfik; Gerber, Richard; Pugh, WilliamModel checking is a powerful technique for analyzing large, finite-state systems. In an infinite-state system, however, many basic properties are undecidable. In this paper, we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically encode a program's transition system, as well as its model-checking computations. All fixpoint calculations are executed symbolically, and their convergence is guaranteed by using approximation techniques. We demonstrate the promise of this technology on some well-known infinite-state concurrency problems. (Also cross-referenced as UMIACS-TR-98-07)Item Nonlinear Array Dependence Analysis(1998-10-15) Pugh, William; Wonnacott, DavidStandard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many ``unanalyzable'' terms: subscript arrays, run-time tests, function calls. The standard approach to analyzing such programs has been to omit and ignore any constraints that cannot be reasoned about. This is unsound when reasoning about value-based dependences and whether privatization is legal. Also, this prevents us from determining the conditions that must be true to disprove the dependence. These conditions could be checked by a run-time test or verified by a programmer or aggressive, demand-driven interprocedural analysis. We describe a solution to these problems. Our solution makes our system sound and more accurate for analyzing value-based dependences and derives conditions that can be used to disprove dependences. We also give some preliminary results from applying our techniques to programs from the Perfect benchmark suite. (Also cross-referenced as UMIACS-TR-94-123)Item Semantics of Multithreaded Java(2001-05-10) Manson, Jeremy; Pugh, WilliamJava has integrated multithreading to a far greater extent than most programming languages. It is also one of the only languages that specifies and requires safety guarantees for improperly synchronized programs. It turns out that understanding these issues is far more subtle and difficult than was previously thought. The existing specification makes guarantees that prohibit standard and proposed compiler optimizations; it also omits guarantees that are necessary for safe execution of much existing code. Some guarantees that are made (e.g., type safety) raise tricky implementation issues when running unsynchronized code on SMPs with weak memory models. This paper reviews those issues. It proposes a new semantics for Java that allows for aggressive compiler optimization and addresses the safety and multithreading issues. (Cross-referenced as UMIACS-TR-2001-09)Item Simplifying Polynomial Constraints Over Integers to Make Dependence Analysis More Precise(1998-10-15) Maslov, Vadim; Pugh, WilliamWhy do existing parallelizing compilers and environments fail to parallelize many realistic FORTRAN programs? One of the reasons is that these programs contain a number of linearized array references, such as {\tt A(M*N*i+N*j+k)} or {\tt A(i*(i+1)/2+j)}. Performing exact dependence analysis for these references requires testing polynomial constraints for integer solutions. Most existing dependence analysis systems, however, restrict themselves to solving affine constraints only, so they have to make worst-case assumptions whenever they encounter a polynomial constraint. In this paper we introduce an algorithm which exactly and efficiently solves a class of polynomial constraints which arise in dependence testing. Another important application of our algorithm is to generate code for loop transformation known as symbolic blocking (tiling). (Also cross-referenced as UMIACS-TR-93-68.1)Item A Skip List Cookbook(1998-10-15) Pugh, WilliamSkip lists are a probabilistic list-based data structure that are a simple and efficient substitute for balanced trees. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space. The original paper on skip lists only presented algorithms for search, insertion and deletion. In this paper, we show that skip lists are as versatile as balanced trees. We describe and analyze algorithms to use search fingers, merge, split and concatenate skip lists, and implement linear list operations using skip lists. The skip list algorithms for these actions are simpler than their balanced tree cousins and at least as fast. The merge algorithm for skip lists we describe has better asymptotic time complexity than any previously described merge algorithm for balanced trees. (Also cross-referenced as UMIACS-TR-89-72.1)Item Static Analysis of Upper and Lower Bounds on Dependences and Parallelism(1998-10-15) Pugh, William; Wonnacott, DavidExisting compilers often fail to parallelize sequential code, even when a program can be manually transformed into parallel form by a sequence of well-understood transformations (as is the case for many of the Perfect Club Benchmark programs). These failures can occur for several reasons: the code transformations implemented in the compiler may not be sufficient to produce parallel code, the compiler may not find the proper sequence of transformations, or the compiler may not be able to prove that one of the necessary transformations is legal. When a compiler extract sufficient parallelism from a program, the programmer extract additional parallelism. Unfortunately, the programmer is typically left to search for parallelism without significant assistance. The compiler generally does not give feedback about which parts of the program might contain additional parallelism, or about the types of transformations that might be needed to realize this parallelism. Standard program transformations and dependence abstractions cannot be used to provide this feedback. In this paper, we propose a two step approach for the search for parallelism in sequential programs: We first construct several sets of constraints that describe, for each statement, which iterations of that statement can be executed concurrently. By constructing constraints that correspond to different assumptions about which dependences might be eliminated through additional analysis, transformations and user assertions, we can determine whether we can expose parallelism by eliminating dependences. In the second step of our search for parallelism, we examine these constraint sets to identify the kinds of transformations that are needed to exploit scalable parallelism. Our tests will identify conditional parallelism and parallelism that can be exposed by combinations of transformations that reorder the iteration space (such as loop interchange and loop peeling). This approach lets us distinguish inherently sequential code from code that contains unexploited parallelism. It also produces information about the kinds of transformations that will be needed to parallelize the code, without worrying about the order of application of the transformations. Furthermore, when our dependence test is inexact, we can identify which unresolved dependences inhibit parallelism by comparing the effects of assuming dependence or independence. We are currently exploring the use of this information in programmer-assisted parallelization. (Also cross-referenced as UMIACS-TR-94-40)