Hardware Design, Prototyping and Studies of the Explicit Multi-Threading (XMT) Paradigm
MetadataShow full item record
With the end of exponential performance improvements in sequential computers, parallel computers, dubbed "chip multiprocessor", "multicore", or "manycore", has been introduced. Unfortunately, programming current parallel computers tends to be far more difficult than programming sequential computers. The Parallel Random Access Model (PRAM) is known to be an easy-to-program parallel computer model and has been widely used by theorists to develop parallel algorithms because it abstracts away architecture details and allows algorithm designers to focus on critical issues. The eXplicit Multi-Threading (XMT) PRAM-On-Chip project seeks to build an easy-to-program on-chip parallel processor by supporting a PRAM-like programming (performance) model. This dissertation focuses on the design, study of the micro-architecture of the XMT processor as well as performance optimization. The main contributions are:(1) Presented a scalable micro-architecture of the XMT based on high level description of the architecture. (2) Designed a synthesizable Verilog HDL (hardware design language) description of XMT, which lead to the first commitment to the silicon of the XMT processor, a 75 MHz XMT FPGA computer. With the same design, we expect to see the first XMT ASIC processor using IBM 90nm technology. (3) Proposed and implemented some architecture upgrades to the XMT: (i)value broadcasting, (ii)hardware/software co-managed prefetch buffers and (iii) hardware/software co-managed read-only buffers. (4) Quantitatively studied the performance of XMT using non-trivial application kernels with the 75 MHz XMT FPGA computer, in addition, the performance of a 800MHz XMT processor is projected. (5) The choice of not having local private caches in the XMT architecture is studied by comparing current architecture with an alternative one that includes conventional coherent private caches.