Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors

dc.contributor.authorZuzak, Michael
dc.contributor.authorYeung, Donald
dc.date.accessioned2016-11-14T02:55:46Z
dc.date.available2016-11-14T02:55:46Z
dc.date.issued2016-11-10
dc.description.abstractHeterogeneous microprocessors integrate CPUs and GPUs on the same chip, providing fast CPU-GPU communication and enabling cores to compute on data "in place." These advantages will permit integrated GPUs to exploit a smaller unit of parallelism. But one challenge will be exposing sufficient parallelism to keep all of the on-chip compute resources fully utilized. In this paper, we argue that integrated CPU-GPU chips should exploit parallelism from multiple loops simultaneously. One example of this is nested parallelism in which one or more inner SIMD loops are nested underneath a parallel outer (non- SIMD) loop. By scheduling the parallel outer loop on multiple CPU cores, multiple dynamic instances of the inner SIMD loops can be scheduled on the GPU cores. This boosts GPU utilization and parallelizes the non-SIMD code. Our preliminary results show exploiting such multi-loop parallelism provides a 3.12x performance gain over exploiting parallelism from individual loops one at a time.en_US
dc.identifierhttps://doi.org/10.13016/M2FR55
dc.identifier.urihttp://hdl.handle.net/1903/18886
dc.language.isoen_USen_US
dc.relation.ispartofseriesUMIACS;UMIACS-TR-2016-01
dc.relation.ispartofseriesUM Computer Science Department;CS-TR-5052
dc.titleExploiting Multi-Loop Parallelism on Heterogeneous Microprocessorsen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
UMIACS-TR-2016-01.pdf
Size:
185.64 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: