Show simple item record

dc.contributor.authorZuzak, Michael
dc.contributor.authorYeung, Donald
dc.date.accessioned2016-11-14T02:55:46Z
dc.date.available2016-11-14T02:55:46Z
dc.date.issued2016-11-10
dc.identifierhttps://doi.org/10.13016/M2FR55
dc.identifier.urihttp://hdl.handle.net/1903/18886
dc.description.abstractHeterogeneous microprocessors integrate CPUs and GPUs on the same chip, providing fast CPU-GPU communication and enabling cores to compute on data "in place." These advantages will permit integrated GPUs to exploit a smaller unit of parallelism. But one challenge will be exposing sufficient parallelism to keep all of the on-chip compute resources fully utilized. In this paper, we argue that integrated CPU-GPU chips should exploit parallelism from multiple loops simultaneously. One example of this is nested parallelism in which one or more inner SIMD loops are nested underneath a parallel outer (non- SIMD) loop. By scheduling the parallel outer loop on multiple CPU cores, multiple dynamic instances of the inner SIMD loops can be scheduled on the GPU cores. This boosts GPU utilization and parallelizes the non-SIMD code. Our preliminary results show exploiting such multi-loop parallelism provides a 3.12x performance gain over exploiting parallelism from individual loops one at a time.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesUMIACS;UMIACS-TR-2016-01
dc.relation.ispartofseriesUM Computer Science Department;CS-TR-5052
dc.titleExploiting Multi-Loop Parallelism on Heterogeneous Microprocessorsen_US
dc.typeTechnical Reporten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record