Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    •   DRUM
    • College of Computer, Mathematical & Natural Sciences
    • Computer Science
    • Technical Reports from UMIACS
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Memory Performance Analysis for Parallel Programs Using Concurrent Reuse Distance

    Thumbnail
    View/Open
    UMIACS-TR-2010-10.pdf (423.4Kb)
    No. of downloads: 594

    Date
    2010-10-05
    Author
    Wu, Meng-Ju
    Yeung, Donald
    Metadata
    Show full item record
    Abstract
    Performance on multicore processors is determined largely by on-chip cache. Computer architects have conducted numerous studies in the past that vary core count and cache capacity as well as problem size to understand impact on cache behavior. These studies are very costly due to the combinatorial design spaces they must explore. Reuse distance (RD) analysis can help architects explore multicore cache performance more efficiently. One problem, however, is multicore RD analysis requires measuring concurrent reuse distance (CRD) profiles across thread-interleaved memory reference streams. Sensitivity to memory interleaving makes CRD profiles architecture dependent, undermining RD analysis benefits. But for parallel programs with symmetric threads, CRD profiles vary with architecture tractably: they change only slightly with cache capacity scaling, and shift predictably to larger CRD values with core count scaling. This enables analysis of a large number of multicore configurations from a small set of measured CRD profiles. This paper investigates using RD analysis to efficiently analyze multicore cache performance for parallel programs, making several contributions. First, we characterize how CRD profiles change with core count and cache capacity. One of our findings is core count scaling degrades locality, but the degradation only impacts last-level caches (LLCs) below 16MB for our benchmarks and problem sizes, increasing to 128MB if problem size scales by 64x. Second, we apply reference groups to predict CRD profiles across core count scaling, and evaluate prediction accuracy. Finally, we use CRD profiles to analyze multicore cache performance. We find predicted CRD profiles can estimate LLC MPKI within 76% of simulation for configurations without pathologic cache conflicts in 1/1200th the time to perform simulation of the full design space.
    URI
    http://hdl.handle.net/1903/10748
    Collections
    • Technical Reports from UMIACS

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility