Hill-Climbing SMT Processor Resource Distribution
Hill-Climbing SMT Processor Resource Distribution
Files
Publication or External Link
Date
2006-04-27
Authors
Choi, Seungryul
Advisor
Yeung, Donald
Citation
DRUM DOI
Abstract
The key to high performance in SMT processors lies in optimizing the shared
resources distribution among simultaneously executing threads. Existing
resource distribution techniques optimize performance only indirectly. They
infer potential performance bottlenecks by observing indicators, like
instruction occupancy or cache miss count, and take actions to try to alleviate
them. While the corrective actions are designed to improve performance, their
actual performance impact is not known since end performance is never
monitored. Consequently, opportunities for performance gains are lost whenever
the corrective actions do not effectively address the actual performance
bottlenecks occurring in the SMT processor pipeline.
In this dissertation, we propose a different approach to SMT processor resource
distribution that optimizes end performance directly. Our approach observes
the impact that resource distribution decisions have on performance at runtime,
and feeds this information back to the resource distribution mechanisms to
improve future decisions. By successively applying and evaluating different
resource distributions, our approach tries to learn the best distribution over
time. Because we perform learning on-line, learning time is crucial. We
develop a hill-climbing SMT processor resource distribution technique that
efficiently learns the best resource distribution by following the performance
gradient within the resource distribution space.
This dissertation makes three contributions within the context of
learning-based SMT processor resource distribution. First, we characterize and
quantify the time-varying performance behavior of SMT processors. This
analysis provides understanding of the behavior and guides the design of our
hill-climbing algorithm. Second, we present a hill-climbing SMT processor
resource distribution technique that performs learning on-line. The
performance evaluation of our approach shows a 11.4% gain over ICOUNT, 11.5%
gain over FLUSH, and 2.8% gain over DCRA across a large set of 63
multiprogrammed workloads. Third, we compare existing resource distribution
techniques to an ideal learning-based technique that performs learning off-line
to show the potential performance of the existing techniques. This limit study
identifies the performance bottleneck of the existing techniques, showing that
the performance of ICOUNT, FLUSH, and DCRA is 13.2%, 13.5%, and 6.6%,
respectively, lower than the ideal performance. Our hill-climbing based
resource distribution, however, handles most of the bottlenecks of the existing
techniques properly, achieving 4.1% lower performance than the ideal case.