Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption

    Thumbnail
    View/Open
    CS-TR-4384.ps (1.170Mb)
    No. of downloads: 216

    Auto-generated copy of CS-TR-4384.ps (244.5Kb)
    No. of downloads: 658

    Date
    2002-08-01
    Author
    Deepak Agarwal
    Donald Yeung
    Metadata
    Show full item record
    Abstract
    s processors continue to deliver higher levels of performance and as memory latency toler-ance techniques become widespread to address the increasing cost of accessing memory, memory bandwidth will emerge as a major performance bottleneck. Rather than rely solely on widerand faster memories to address memory bandwidth shortages, an alternative is to use existing memory bandwidth more efficiently. A promising approach is hardware-based selective sub-blocking. In this technique, hardware predictors track the portions of cache blocks that are referenced by the processor. On a cache miss, the predictors are consulted and only previ-ously referenced portions are fetched into the cache, thus conserving memory bandwidth. This paper proposes a software-centric approch to selective sub-blocking. We make the keyobservation that wasteful data fetching inside long cache blocks arises due to certain sparse memory references, and that such memory references can be identified in the application sourcecode. Rather than use hardware predictors to discover sparse memory reference patterns from the dynamic memory reference stream, our approach relies on the programmer or compiler toidentify the sparse memory references statically, and to use special annotated memory instructions to specify the amount of spatial reuse associated with such memory references. At runtime,the size annotations select the amount of data to fetch on each cache miss, thus fetching only data that will likely be accessed by the processor. Our results show annotated memory instruc-tions remove between 54% and 71% of cache traffic for 7 applications, reducing more traffic than hardware selective sub-blocking using a 32 Kbyte predictor on all applications, and reducingas much traffic as hardware selective sub-blocking using an 8 Mbyte predictor on 5 out of 7 applications. Overall, annotated memory instructions achieve a 17% performance gain whenused alone, and a 22.3% performance gain when combined with software prefetching, compared to a 7.2% performance degradation when prefetching without annotated memory instructions. This is an updated version of CS-TR-4304. Also UMIACS-TR-2002-64
    URI
    http://hdl.handle.net/1903/1215
    Collections
    • Technical Reports from UMIACS
    • Technical Reports of the Computer Science Department

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility