Minimizing Communication while Preserving Parallelism

dc.contributor.authorKelly, Wayneen_US
dc.contributor.authorPugh, Williamen_US
dc.date.accessioned2004-05-31T22:36:26Z
dc.date.available2004-05-31T22:36:26Z
dc.date.created1995-12en_US
dc.date.issued1998-10-15en_US
dc.description.abstractTo compile programs for message passing architectures and to obtain good performance on NUMA architectures it is necessary to control how computations and data are mapped to processors. Languages such as High-Performance Fortran use data distributions supplied by the programmer and the owner computes rule to specify this. However, the best data and computation decomposition may differ from machine to machine and require substantial expertise to determine. Therefore, automated decomposition is desirable. All existing methods for automated data/computation decomposition share a common failing: they are very sensitive to the original loop structure of the program. While they find a good decomposition for that loop structure, it may be possible to apply transformations (such as loop interchange and distribution) so that a different decomposition gives even better results. We have developed automatic data/computation decomposition methods that are not sensitive to the original program structure. We can model static and dynamic data decompositions as well as computation decompositions that cannot be represented by data decompositions and the owner computes rule. We make use of both parallel loops and doacross/pipelined loops to exploit parallelism. We describe an automated translation of the decomposition problem into a weighted graph that incorporates estimates of both parallelism and communication for various candidate computation decompositions. We solve the resulting search problem exactly in a very short time using a new algorithm that has shown to be able to prune away a majority of the vast search space. We assume that the selection of the computation decomposition is followed by a transformation phase that reorders the iterations to best match the selected computation decomposition. Our graph includes constraints to ensure that a reordering transformation giving the predicted parallelism exists. (Also cross-referenced as UMIACS-TR-95-118)en_US
dc.format.extent204452 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/781
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-3571en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-95-118en_US
dc.titleMinimizing Communication while Preserving Parallelismen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
CS-TR-3571.ps
Size:
199.66 KB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-3571.pdf
Size:
246.31 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-3571.ps