Optimizing SMT Processors for High Single-Thread Performance

dc.contributor.authorDorai, Gautham K.en_US
dc.contributor.authorYeung, Donalden_US
dc.contributor.authorChoi, Seungryulen_US
dc.date.accessioned2004-05-31T23:24:43Z
dc.date.available2004-05-31T23:24:43Z
dc.date.created2003-01en_US
dc.date.issued2003-02-05en_US
dc.description.abstractSimultaneous Multithreading (SMT) processors achieve high processor throughput at the expense of single-thread performance. This paper investigates resource allocation policies for SMT processors that preserve, as much as possible, the single-thread performance of designated ``foreground'' threads, while still permitting other ``background'' threads to share resources. Since background threads on such an SMT machine have a near-zero performance impact on foreground threads, we refer to the background threads as transparent threads. Transparent threads are ideal for performing low-priority or non-critical computations, with applications in process scheduling, subordinate multithreading, and on-line performance monitoring. To realize transparent threads, we propose three mechanisms for maintaining the transparency of background threads: slot prioritization, background thread instruction-window partitioning, and background thread flushing. In addition, we propose three mechanisms to boost background thread performance without sacrificing transparency: aggressive fetch partitioning, foreground thread instruction-window partitioning, and foreground thread flushing. We implement our mechanisms on a detailed simulator of an SMT processor, and evaluate them using 8 benchmarks, including 7 from the SPEC CPU2000 suite. Our results show when cache and branch predictor interference are factored out, background threads introduce less than 1% performance degradation on the foreground thread. Furthermore, maintaining the transparency of background threads reduces their throughput by only 23% relative to an equal priority scheme. To demonstrate the usefulness of transparent threads, we study Transparent Software Prefetching (TSP), an implementation of software data prefetching using transparent threads. Due to its near-zero overhead, TSP enables prefetch instrumentation for all loads in a program, eliminating the need for profiling. TSP, without any profile information, achieves a 9.52% gain across 6 SPEC benchmarks, whereas conventional software prefetching guided by cache-miss profiles increases performance by only 2.47%. Also UMIACS-TR-2003-07en_US
dc.format.extent942937 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/1252
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4436en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2003-07en_US
dc.titleOptimizing SMT Processors for High Single-Thread Performanceen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
CS-TR-4436.ps
Size:
920.84 KB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-4436.pdf
Size:
295.91 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-4436.ps