Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
As processors continue to deliver ever higher levels of performance
and as memory latency tolerance techniques become widespread to
address the increasing cost of accessing memory, memory bandwidth will
emerge as a major limitation to continued increases in application
performance. In this paper, we propose a hybrid hardware/software
technique for addressing the memory bandwidth bottleneck by more
intelligently transferring data between the memory system and cache.
Our approach uses off-line analysis of the source code and special
annotated memory instructions to convey spatial locality information
to the hardware at runtime. The memory system uses this information
to fetch only the data that will be accessed by the program--data that
is unlikely to be referenced is not fetched, hence reducing the
application's memory traffic. Our technique uses modified sectored
caches to fetch and cache the variable-sized fine-grained data
accessed through annotated memory instructions. Our results show that
annotated memory instructions remove between 20% and 59% of the
cache traffic for 7 applications. Furthermore, annotated memory
instructions achieve a 13% performance gain on a cycle-accurate
simulator when used alone, and a 26.4% performance gain when combined
with software prefetching, compared to a 2.3% performance degradation
when prefetching with normal memory instructions.
This has been replaced by CS-TR-4384.
(Also referenced as UMIACS-TR-2001-81)