Minimizing Communication while Preserving Parallelism

Kelly, Wayne; Pugh, William

Minimizing Communication while Preserving Parallelism

dc.contributor.author	Kelly, Wayne	en_US
dc.contributor.author	Pugh, William	en_US
dc.date.accessioned	2004-05-31T22:36:26Z
dc.date.available	2004-05-31T22:36:26Z
dc.date.created	1995-12	en_US
dc.date.issued	1998-10-15	en_US
dc.description.abstract	To compile programs for message passing architectures and to obtain good performance on NUMA architectures it is necessary to control how computations and data are mapped to processors. Languages such as High-Performance Fortran use data distributions supplied by the programmer and the owner computes rule to specify this. However, the best data and computation decomposition may differ from machine to machine and require substantial expertise to determine. Therefore, automated decomposition is desirable. All existing methods for automated data/computation decomposition share a common failing: they are very sensitive to the original loop structure of the program. While they find a good decomposition for that loop structure, it may be possible to apply transformations (such as loop interchange and distribution) so that a different decomposition gives even better results. We have developed automatic data/computation decomposition methods that are not sensitive to the original program structure. We can model static and dynamic data decompositions as well as computation decompositions that cannot be represented by data decompositions and the owner computes rule. We make use of both parallel loops and doacross/pipelined loops to exploit parallelism. We describe an automated translation of the decomposition problem into a weighted graph that incorporates estimates of both parallelism and communication for various candidate computation decompositions. We solve the resulting search problem exactly in a very short time using a new algorithm that has shown to be able to prune away a majority of the vast search space. We assume that the selection of the computation decomposition is followed by a transformation phase that reorders the iterations to best match the selected computation decomposition. Our graph includes constraints to ensure that a reordering transformation giving the predicted parallelism exists. (Also cross-referenced as UMIACS-TR-95-118)	en_US
dc.format.extent	204452 bytes
dc.format.mimetype	application/postscript
dc.identifier.uri	http://hdl.handle.net/1903/781
dc.language.iso	en_US
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_US
dc.relation.isAvailableAt	University of Maryland (College Park, Md.)	en_US
dc.relation.isAvailableAt	Tech Reports in Computer Science and Engineering	en_US
dc.relation.isAvailableAt	UMIACS Technical Reports	en_US
dc.relation.ispartofseries	UM Computer Science Department; CS-TR-3571	en_US
dc.relation.ispartofseries	UMIACS; UMIACS-TR-95-118	en_US
dc.title	Minimizing Communication while Preserving Parallelism	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: CS-TR-3571.ps
Size:: 199.66 KB
Format:: Postscript Files

Download

Name:: CS-TR-3571.pdf
Size:: 246.31 KB
Format:: Adobe Portable Document Format
Description:: Auto-generated copy of CS-TR-3571.ps

Download

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department