Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors
Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors
Loading...
Files
Publication or External Link
Date
2016-11-10
Authors
Zuzak, Michael
Yeung, Donald
Advisor
Citation
DRUM DOI
Abstract
Heterogeneous microprocessors integrate CPUs and GPUs on the same chip,
providing fast CPU-GPU communication and enabling cores to compute on
data "in place." These advantages will permit integrated GPUs to exploit
a smaller unit of parallelism. But one challenge will be exposing
sufficient parallelism to keep all of the on-chip compute resources
fully utilized. In this paper, we argue that integrated CPU-GPU chips
should exploit parallelism from multiple loops simultaneously. One
example of this is nested parallelism in which one or more inner SIMD
loops are nested underneath a parallel outer (non- SIMD) loop. By
scheduling the parallel outer loop on multiple CPU cores, multiple
dynamic instances of the inner SIMD loops can be scheduled on the GPU
cores. This boosts GPU utilization and parallelizes the non-SIMD code.
Our preliminary results show exploiting such multi-loop parallelism
provides a 3.12x performance gain over exploiting parallelism from
individual loops one at a time.