Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption

Deepak Agarwal; Donald Yeung

Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption

dc.contributor.author	Deepak Agarwal	en_US
dc.contributor.author	Donald Yeung	en_US
dc.date.accessioned	2004-05-31T23:20:09Z
dc.date.available	2004-05-31T23:20:09Z
dc.date.created	2002-07	en_US
dc.date.issued	2002-08-01	en_US
dc.description.abstract	s processors continue to deliver higher levels of performance and as memory latency toler-ance techniques become widespread to address the increasing cost of accessing memory, memory bandwidth will emerge as a major performance bottleneck. Rather than rely solely on widerand faster memories to address memory bandwidth shortages, an alternative is to use existing memory bandwidth more efficiently. A promising approach is hardware-based selective sub-blocking. In this technique, hardware predictors track the portions of cache blocks that are referenced by the processor. On a cache miss, the predictors are consulted and only previ-ously referenced portions are fetched into the cache, thus conserving memory bandwidth. This paper proposes a software-centric approch to selective sub-blocking. We make the keyobservation that wasteful data fetching inside long cache blocks arises due to certain sparse memory references, and that such memory references can be identified in the application sourcecode. Rather than use hardware predictors to discover sparse memory reference patterns from the dynamic memory reference stream, our approach relies on the programmer or compiler toidentify the sparse memory references statically, and to use special annotated memory instructions to specify the amount of spatial reuse associated with such memory references. At runtime,the size annotations select the amount of data to fetch on each cache miss, thus fetching only data that will likely be accessed by the processor. Our results show annotated memory instruc-tions remove between 54% and 71% of cache traffic for 7 applications, reducing more traffic than hardware selective sub-blocking using a 32 Kbyte predictor on all applications, and reducingas much traffic as hardware selective sub-blocking using an 8 Mbyte predictor on 5 out of 7 applications. Overall, annotated memory instructions achieve a 17% performance gain whenused alone, and a 22.3% performance gain when combined with software prefetching, compared to a 7.2% performance degradation when prefetching without annotated memory instructions. This is an updated version of CS-TR-4304. Also UMIACS-TR-2002-64	en_US
dc.format.extent	1227077 bytes
dc.format.mimetype	application/postscript
dc.identifier.uri	http://hdl.handle.net/1903/1215
dc.language.iso	en_US
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_US
dc.relation.isAvailableAt	University of Maryland (College Park, Md.)	en_US
dc.relation.isAvailableAt	Tech Reports in Computer Science and Engineering	en_US
dc.relation.isAvailableAt	UMIACS Technical Reports	en_US
dc.relation.ispartofseries	UM Computer Science Department; CS-TR-4384	en_US
dc.relation.ispartofseries	CS-TR-4304.	en_US
dc.relation.ispartofseries	UMIACS; UMIACS-TR-2002-64	en_US
dc.title	Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: CS-TR-4384.ps
Size:: 1.17 MB
Format:: Postscript Files

Download

Name:: CS-TR-4384.pdf
Size:: 244.53 KB
Format:: Adobe Portable Document Format
Description:: Auto-generated copy of CS-TR-4384.ps

Download

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department