Looking to Parallel Algorithms for ILP and Decentralization

Berkovich, Efraim; Jacob, Bruce; Nuzman, Joseph; Vishkin, Uzi

Looking to Parallel Algorithms for ILP and Decentralization

dc.contributor.author	Berkovich, Efraim	en_US
dc.contributor.author	Jacob, Bruce	en_US
dc.contributor.author	Nuzman, Joseph	en_US
dc.contributor.author	Vishkin, Uzi	en_US
dc.date.accessioned	2004-05-31T21:07:47Z
dc.date.available	2004-05-31T21:07:47Z
dc.date.created	1998-07-20	en_US
dc.date.issued	1998-10-15	en_US
dc.description.abstract	We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained SPMD-style programming; a SPMD program can translate directly to MIPS assembly language using three additional instruction primitives. The motivation for XMT is: (i) to define an inherently decentralizable architecture, taking into account that the performance of future integrated circuits will be dominated by wire costs, (ii) to increase available instruction-level parallelism (ILP) by leveraging expertise in the world of parallel algorithms, and (iii) to reduce hardware complexity by alleviating the need to detect ILP at run-time: if parallel algorithms can give us an overabundance of work to do in the form of thread-level parallelism, one can extract instruction-level parallelism with greatly simplified dependence-checking. We show that implementations of such an architecture tend towards decentralization and that, when global communication is necessary, overall performance is relatively insensitive to large on-chip delays. We compare the performance of the design to more traditional parallel architectures and to a high-performance superscalar implementation, but the intent is merely to illustrate the performance behavior of the organization and to stimulate debate on the viability of introducing SPMD to the single-chip processor domain. We cannot offer at this stage hard comparisons with well-researched models of execution. When programming for the SPMD model, the total number of operations that the processor has to perform is often slightly higher. To counter this, we have observed that the length of the critical path through the dynamic execution graph is smaller than in the serial domain, and the amount of ILP is correspondingly larger. Fine-grained SPMD programming connects with a broad knowledge base in parallel algorithms and scales down to provide good performance relative to high-performance superscalar designs even with small input sizes and small numbers of functional units. Keywords: Fine-grained SPMD, parallel algorithms. spawn-join, prefix-sum, instruction-level parallelism, decentralized architecture. (Also cross-referenced as UMIACS-TR- 98-40)	en_US
dc.format.extent	469924 bytes
dc.format.mimetype	application/postscript
dc.identifier.uri	http://hdl.handle.net/1903/496
dc.language.iso	en_US
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_US
dc.relation.isAvailableAt	University of Maryland (College Park, Md.)	en_US
dc.relation.isAvailableAt	Tech Reports in Computer Science and Engineering	en_US
dc.relation.isAvailableAt	Computer Science Department Technical Reports	en_US
dc.relation.ispartofseries	UM Computer Science Department; CS-TR-3921	en_US
dc.title	Looking to Parallel Algorithms for ILP and Decentralization	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: CS-TR-3921.ps
Size:: 458.91 KB
Format:: Postscript Files

Download

Name:: CS-TR-3921.pdf
Size:: 123.26 KB
Format:: Adobe Portable Document Format
Description:: Auto-generated copy of CS-TR-3921.ps

Download

Collections

Technical Reports of the Computer Science Department