Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Understanding Multicore Cache Behavior of Loop-based Parallel Programs via Reuse Distance Analysis

    Thumbnail
    View/Open
    UMIACS-TR-2012-01.pdf (752.5Kb)
    No. of downloads: 697

    Date
    2012-01-17
    Author
    Wu, Meng-Ju
    Yeung, Donald
    Metadata
    Show full item record
    Abstract
    Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarchies employed in modern CPUs. In today's hierarchies, performance is determined by complex thread interactions, such as interference in shared caches and replication and communication in private caches. Researchers normally perform simulation to sort out these interactions, but this can be costly and not very insightful. An alternative is reuse distance (RD) analysis. RD analysis for multicore processors is becoming feasible because recent research has developed new notions of reuse distance that can analyze thread interactions. In particular, concurrent reuse distance (CRD) models shared cache interference, while private-stack reuse distance (PRD) models private cache replication and communication. Previous multicore RD research has centered around developing techniques and verifying accuracy. In this paper, we apply multicore RD analysis to better understand memory behavior. We focus on loop-based parallel programs, an important class of programs for which RD analysis provides high accuracy. First, we develop techniques to isolate thread interactions, permitting analysis of their relative contributions. Then, we use our techniques to extract several new insights that can help architects optimize multicore cache hierarchies. One of our findings is that data sharing in parallel loops varies with reuse distance, becoming significant only at larger RD values. This implies capacity sharing in shared caches and replication/communication in private caches occur only beyond some capacity. We define Cshare to be the turn-on capacity for data sharing, and study its impact on private vs. shared cache performance. In addition, we find machine scaling degrades locality at smaller RD values and increases sharing frequency (i.e., reduces Cshare). We characterize how these effects vary with core count, and study their impact on the preference for private vs. shared caches.
    URI
    http://hdl.handle.net/1903/12438
    Collections
    • Technical Reports from UMIACS

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility