Deepak AgarwalDonald YeungAs processors continue to deliver ever higher levels of performance and as memory latency tolerance techniques become widespread to address the increasing cost of accessing memory, memory bandwidth will emerge as a major limitation to continued increases in application performance. In this paper, we propose a hybrid hardware/software technique for addressing the memory bandwidth bottleneck by more intelligently transferring data between the memory system and cache. Our approach uses off-line analysis of the source code and special annotated memory instructions to convey spatial locality information to the hardware at runtime. The memory system uses this information to fetch only the data that will be accessed by the program--data that is unlikely to be referenced is not fetched, hence reducing the application's memory traffic. Our technique uses modified sectored caches to fetch and cache the variable-sized fine-grained data accessed through annotated memory instructions. Our results show that annotated memory instructions remove between 20\% and 59\% of the cache traffic for 7 applications. Furthermore, annotated memory instructions achieve a 13\% performance gain on a cycle-accurate simulator when used alone, and a 26.4\% performance gain when combined with software prefetching, compared to a 2.3\% performance degradation when prefetching with normal memory instructions. This has been replaced by CS-TR-4384. (Also referenced as UMIACS-TR-2001-81)en-USExploiting Application-Level Information to Reduce Memory Bandwidth ConsumptionTechnical Report