|
DRUM >
A. James Clark School of Engineering >
Electrical & Computer Engineering >
Electrical & Computer Engineering Research Works >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1903/7453
|
| Title: | Last Level Cache (LLC) Performance of Data Mining Workloads On a CMP — Case Study of Parallel Bioinformatics Workloads |
| Authors: | Jaleel, Aamer Mattina, Matthew Jacob, Bruce |
| Type: | Presentation |
| Keywords: | bioinformatics chip-multiprocessor (CMP) cache hierarchy |
| Issue Date: | Feb-2006 |
| Citation: | "Last-level cache (LLC) performance of data-mining workloads on a CMP--A case study of parallel bioinformatics workloads." Aamer Jaleel, Matthew Mattina, and Bruce Jacob. Proc. 12th International Symposium on High Performance Computer Architecture (HPCA 2006), Austin TX, February 2006. |
| Abstract: | With the continuing growth in the amount of genetic data, members of the bioinformatics community are developing a variety of data-mining applications to understand the data and discover meaningful information. These applications are important in defining the design and performance decisions of future high performance microprocessors. This paper presents a detailed data-sharing analysis and chip-multiprocessor (CMP) cache study of several multithreaded data-mining bioinformatics workloads. For a CMP with a
three-level cache hierarchy, we model the last-level of the cache hierarchy as either multiple private caches or a single cache shared amongst different cores of the CMP. Our experiments show that the
bioinformatics workloads exhibit significant data-sharing—50–95% of the data cache is shared by the different threads of the workload. Furthermore, regardless of the amount of data cache shared, for some
workloads, as many as 98% of the accesses to the last-level cache are to shared data cache lines. Additionally, the amount of data-sharing exhibited by the workloads is a function of the total cache size
available—the larger the data cache the better the sharing behavior.
Thus, partitioning the available last-level cache silicon area into multiple private caches can cause applications to lose their inherent data-sharing behavior. For the workloads in this study, a shared
32MB last-level cache is able to capture a tremendous amount of data-sharing and outperform a 32MB private cache configuration by several orders of magnitude. Specifically, with shared last-level caches, the bandwidth demands beyond the last-level cache can be
reduced by factors of 3–625 when compared to private last-level caches. |
| URI: | http://hdl.handle.net/1903/7453 |
| Appears in Collections: | Electrical & Computer Engineering Research Works
|
All items in DRUM are protected by copyright, with all rights reserved.
|