Pipelined CPU-GPU Scheduling for Caches

dc.contributor.authorGerzhoy, Daniel
dc.contributor.authorYeung, Donald
dc.date.accessioned2021-03-29T16:05:14Z
dc.date.available2021-03-29T16:05:14Z
dc.date.issued2021-03-23
dc.description.abstractHeterogeneous microprocessors integrate a CPU and GPU with a shared cache hierarchy on the same chip, affording low-overhead communication between the CPU and GPU's cores. Often times, large array data structures are communicated from the CPU to the GPU and back. While the on-chip cache hierarchy can support such CPU-GPU producer-consumer sharing, this almost never happens due to poor temporal reuse. Because the data structures can be quite large, by the time the consumer reads the data, it has been evicted from cache even though the producer had brought it on-chip when it originally wrote the data. As a result, the CPU-GPU communication happens through main memory instead of the cache, hurting performance and energy. This paper exploits the on-chip caches in a heterogeneous microprocessor to improve CPU-GPU communication efficiency. We divide streaming computations executed by the CPU and GPU that exhibit producer-consumer sharing into chunks, and overlap the execution of CPU chunks with GPU chunks in a software pipeline. To enforce data dependences, the producer executes one chunk ahead of the consumer at all times. We also propose a low-overhead synchronization mechanism in which the CPU directly controls thread-block scheduling in the GPU to maintain the producer's "run-ahead distance" relative to the consumer. By adjusting the chunk size or run-ahead distance, we can make the CPU-GPU working set fit in the last-level cache, thus permitting the producer-consumer sharing to occur through the LLC. We show through simulation that our technique reduces the number of DRAM accesses by 30.4%, improves performance by 26.8%, and lowers memory system energy by 27.4% averaged across 7 benchmarks.en_US
dc.identifierhttps://doi.org/10.13016/vpzu-qzh7
dc.identifier.urihttp://hdl.handle.net/1903/26933
dc.language.isoen_USen_US
dc.relation.ispartofseriesTechnical Reports from UMIACS;UMIACS-TR-2021-01
dc.relation.ispartofseriesUM Computer Science Department;CS-TR-5065
dc.titlePipelined CPU-GPU Scheduling for Cachesen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
UMIACS-TR-2021-01.pdf
Size:
703.29 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: