Browsing by Author "Liu, Wanli"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Enhancing LTP-Driven Cache Management Using Reuse Distance Information(2007-06-18) Liu, Wanli; Yeung, DonaldTraditional caches employ the LRU management policy to drive replacement decisions. However, previous studies have shown LRU can perform significantly worse than the theoretical optimum, OPT [1]. To better match OPT, it is necessary to aggressively anticipate the future memory references performed in the cache. Recently, several researchers have tried to approximate OPT management by predicting last touch references [2, 3, 4, 5]. Existing last touch predictors (LTPs) either correlate last touch references with execution signatures, like instruction traces [3, 4] or last touch history [5], or they predict cache block life times based on reference [2] or cycle [6] counts. On a predicted last touch, the referenced cache block is marked for early eviction. This permits cache blocks lower in the LRU stackbut with shorter reuse distances to remain in cache longer, resulting in additional cache hits. This paper investigates two novel techniques to improve LTP-driven cache management. First, we pro- pose exploiting reuse distance information to increase LTP accuracy. Specifically, we correlate a memory references last touch outcome with its global reuse distance history. Our results show that for an 8-way 1 MB L2 cache, a 74 KB RD-LTP can reduce the cache miss rate by 11.5% and 14.5% compared to LvP and AIP [2], two state-of-the-art last touch predictors. These performance gains are achieved because RD-LTPs exhibit a much higher prediction rate compared to existing LTPs, and RD-LTPs often avoid evicting LNO last touches [5], increasing the proportion of OPT last touches they evict. Second, we also propose predicting actual reuse distance values using reuse distance predictors (RDPs). An RDP is very similar to an RD-LTP except its predictor table stores exact reuse distance values instead of last touch outcomes. Because RDPs predict reuse distances, we can distinguish between LNO and OPT last touches more accurately. Our results show an 80 KB RDP can improve the miss rate compared to an RD-LTP by an additional 3.7%.Item Probabilistic Replacement: Enabling Flexible Use of Shared Caches for CMPs(2008-09-05) Liu, Wanli; Yeung, DonaldCMPs allow threads to share portions of the on-chip cache. Critical to successful sharing are the policies for allocating the shared cache to threads. Researchers have observed the best allocation policy often depends on the degree of cache interference. Typically, workloads with high cache interference require explicit working set isolation via cache partitioning, while workloads with low cache interference perform well without explicit allocation-i.e., using LRU. While cache interference impacts cache allocation policies, relatively little is known about its root causes. This paper investigates how different sharing patterns in multiprogrammed workloads give rise to varying degrees of cache interference. We show cache interference is tied to the granularity of interleaving amongst inter-thread memory references: ne-grain interleaving yields high cache interference while coarse-grain interleaving yields low cache differences in mapping and timing of per-thread references in the cache. For example, coarse-grain interleaving occurs anytime per-thread references map to disjoint cache sets, or are performed during distinct time intervals. We quantify such spatial and temporal isolation of per-thread memory references, and correlate its degree to LRU and partitioning performance. This paper also proposes probabilistic replacement (PR), a new cache allocation policy motivated by our reference interleaving insights. PR controls the rate at which inter-thread replacements transfer cache resources between threads, and hence, the per-thread cache allocation boundary. Because PR operates on inter-thread replacements, it adapts to cache interference. When interleaving is coarse-grained (low interference), inter-thread replacements are rare, so PR reverts to LRU. As interleaving becomes more ne-grained (higher interference), inter-thread replacements increase, and PR in turn creates more resistance to impede cache allocation. Our results show PR outperforms LRU, UCP, and an ideal cache partitioning technique by 4.86%, 3.15%, and 1.09%, respectively.Item Using Locality and Interleaving Information to Improve Shared Cache Performance(2009) Liu, Wanli; Yeung, Donald; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The cache interference is found to play a critical role in optimizing cache allocation among concurrent threads for shared cache. Conventional LRU policy usually works well for low interference workloads, while high cache interference among threads demands explicit allocation regulation, such as cache partitioning. Cache interference is shown to be tied to inter-thread memory reference interleaving granularity: high interference is caused by ne-grain interleaving while low interference is caused coarse-grain interleaving. Proling of real multi-program workloads shows that cache set mapping and temporal phase result in the variation of interleaving granularity. When memory references from dierent threads map to disjoint cache sets, or they occur in distinct time windows, they tend to cause little interference due to coarse-grain interleaving. The interleaving granularity measured by runlength in workloads is found to correlate with the preference of cache management policy: ne-grain interleaving workloads perform better with cache partitioning, and coarse-grain interleaving workloads perform better with LRU. Most existing shared cache management techniques are based on working set locality analysis. This dissertation studies the shared cache performance by taking both locality and interleaving information into consideration. Oracle algorithm which provides theoretical best performance is investigated to provide insight into how to design a better practical policy. Proling and analysis of Oracle algorithm lead to the proposal of probabilistic replacement (PR), a novel cache allocation policy. With aggressor threads information learned on-line, PR evicts the bad locality blocks of aggressor threads probabilistically while preserving good locality blocks of non-aggressor threads. PR is shown to be able to adapt to the different interleaving granularities in different sets over time. Its flexibility in tuning eviction probability also improves fairness among thread performance. Evaluation indicates that PR outperforms LRU, UCP, and ideal cache partitioning at moderate hardware cost. For single program cache management, this dissertation also proposes a novel technique: reuse distance last touch predictor (RD-LTP). RD-LTP is able to capture reuse distance information, which represents the intrinsic memory reference pattern. Based on this improved LT predictor, an MRU LT eviction policy is developed to select the right victim at the presence of incorrect LT prediction. In addition to LT predictor, another predictor: reuse distance predictors (RDPs) is proposed, which is able to predict actual reuse distance values. Compared to various existing cache management techniques, these two novel predictors deliver higher cache performance with higher prediction coverage and accuracy at moderate hardware cost.