Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform
Publication or External Link
A bold vision that guided this work is as follows: (i) a parallel algorithms and programming course could become a standard course in every undergraduate computer science program, and (ii) this course could be coupled with a so-called PRAM-On-Chip architecture --- a commodity high-end multi-core computer architecture. In fact, the current paper is a tutorial on how to convert PRAM algorithms into efficient PRAM-On-Chip programs. Coupled with a text on PRAM algorithms as well as an available PRAM-On-Chip tool-chain, comprising a compiler and a simulator, the paper provides the missing link for upgrading a standard theoretical PRAM algorithms class to a parallel algorithms and programming class. Having demonstrated that such a course could cover similar programming projects and material to what is covered by a typical first serial algorithms and programming course, the paper suggests that parallel programming in the emerging multi-core era does not need to be more difficult than serial programming. If true, a powerful answer to the so-called parallel programming open problem is being provided. This open problem is currently the main stumbling block for the industry in getting the upcoming generation of multi-core architectures to improve single task completion time using easy-to-program application programmer interfaces. Known constraints of this open problem, such as backwards compatibility on serial code, are also addressed by the overall approach.
More concretely, a widely used methodology for advancing parallel algorithmic thinking into parallel algorithms is revisited, and is extended into a methodology for advancing parallel algorithms to PRAM-On-Chip programs. A performance cost model for the PRAM-On-Chip is also presented. It uses as complexity metrics the length of sequence of round trips to memory (LSRTM) and queuing delay (QD) from memory access queues, in addition to standard PRAM computation costs. Highlighting the importance of LSRTM in determining performance is another contribution of the paper. Finally, some alternatives to PRAM algorithms, which, on one hand, are easier-to-think, but, on the other hand, suppress more architecture details, are also discussed.