Understanding and Optimizing High-Speed Serial Memory System Protocols
Publication or External Link
Performance improvements in memory systems have traditionally been obtained by scaling data bus width and speed. Maintaining this trend while continuing to satisfy memory capacity demands of server systems is challenging due to the electrical constraints posed by high-speed parallel buses. To satisfy the dual needs of memory bandwidth and memory system capacity, new memory system protocols have been proposed by the leaders in the memory system industry. These protocols replace the conventional memory bus interface between the memory controller and the memory modules with narrow, high-speed, uni-directional point-to point interfaces. The memory controller communicates with the memory modules using a packet-based protocol, which is translated to the conventional DRAM commands at the memory modules.
Memory latency has been widely accepted as one of the key performance bottlenecks in computer architecture. Hence, any changes to memory sub-system architecture and protocol can have a significant impact on overall system performance. In the first part of this dissertation, we did an extensive study and analysis of how the behavior of newly proposed memory architecture to identify clearly how it impacts memory sub-system performance and what the key performance limiters are. We then went on to use the insights we gained from this analysis to propose two optimization techniques focussed on improving the performance of the memory system.
We first evaluated the performance of the current de facto serial memory system standard, FBDIMM (Fully Buffered DIMM) with respect to the conventional wide-bus architectures that have been in use for decades. We found that the relative performance of a FBDIMM system with respect to a conventional DDRx system was a strong function of the bandwidth utilization, with FBDIMM systems doing worse in low utilization systems and often out-performing DDRx systems at higher system utilizations. More interestingly, we found that many of the memory controller policies that have been in use in DDRx systems performed similarly on a FBDIMM system.
Memory latency typically has a significant impact on overall system performance. FBDIMM systems, by using daisy chaining and serialization, increase the default latency cost of a memory transaction. In a longer memory channel, i.e. a channel with 8 DIMMs of memory, inefficient link utilization and memory controller scheduling policies can contribute to a further reduction in system performance. We propose two main optimization techniques to tackle these inefficiencies - reordering data on the return link and buffering at the memory module. Both these policies lower read latency by 10-20% and improve application performance by 2-25%.