THE EFFECTS OF AGGRESSIVE OUT-OF-ORDER MECHANISMS ON THE MEMORY SUB-SYSTEM
Jacob, Bruce L
MetadataShow full item record
Contrary to existing work that demonstrate significant improvements in performance with larger reorder buffers, the work presented in this dissertation shows that larger instruction windows do not necessarily provide the significant improvements in performance. By using detailed models of the DRAM system and the memory subsystem, we show that increasing out-of-order aggressiveness by increasing reorder buffer sizes beyond 128 entries no longer buys any improvement in processor performance. In fact we observe that it can actually degrade processor performance. Additionally, this dissertation demonstrates a non-intuitive problem associated with the out-of-order execution of memory instructions: the reordering of memory instructions can cause a degradation in the performance of the memory subsystem. Specifically, we show that increasing out-of-order aggressiveness in terms of reorder buffer sizes increases the frequency of replay traps and data cache misses. The presentation of this problem in itself is of utmost significance: the very mechanisms commonly used to improve performance are sources of performance degradation in the memory subsystem. We observe that while the negative effects of out-of-order execution existed for only a small fraction of the time with small reorder buffers, eliminating other sources of stalls by increasing out-of-order capability introduces these unexpected side effects in the memory subsystem to represent significant overhead. This reveals that one can not overlook rarely occurring events in the memory subsystem. To gain insight on the source of the problem, we attempt to measure the degree to which memory system performance relies on out-of-order execution. Using the network communication concept of windowing, we decided to change the load/store scheduling window independently of the ALU scheduling window. Our study revealed that memory instructions issued out-of-order are the primary reason for the increase in the frequency of replay traps. On the other hand, the out-of-order issue of memory instructions is responsible for the constructive and destructive references to the data cache. Incorporating detailed memory subsystem models and a realistic DRAM model into existing simulators and filtering out the destructive references from the total cache references can allow for aggressive out-of-order cores to reap the true benefits of out-of-order execution.