Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors

Zuzak, Michael; Yeung, Donald

Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors

Files

UMIACS-TR-2016-01.pdf (185.64 KB)

No. of downloads: 141

Date

2016-11-10

Authors

Zuzak, Michael

Yeung, Donald

DRUM DOI

https://doi.org/10.13016/M2FR55

Abstract

Heterogeneous microprocessors integrate CPUs and GPUs on the same chip, providing fast CPU-GPU communication and enabling cores to compute on data "in place." These advantages will permit integrated GPUs to exploit a smaller unit of parallelism. But one challenge will be exposing sufficient parallelism to keep all of the on-chip compute resources fully utilized. In this paper, we argue that integrated CPU-GPU chips should exploit parallelism from multiple loops simultaneously. One example of this is nested parallelism in which one or more inner SIMD loops are nested underneath a parallel outer (non- SIMD) loop. By scheduling the parallel outer loop on multiple CPU cores, multiple dynamic instances of the inner SIMD loops can be scheduled on the GPU cores. This boosts GPU utilization and parallelizes the non-SIMD code. Our preliminary results show exploiting such multi-loop parallelism provides a 3.12x performance gain over exploiting parallelism from individual loops one at a time.

URI (handle)

http://hdl.handle.net/1903/18886

Collections

Technical Reports from UMIACS

Full item page