A. James Clark School of Engineering

Permanent URI for this communityhttp://hdl.handle.net/1903/1654

The collections in this community comprise faculty research works, as well as graduate theses and dissertations.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Prefetching Vs The Memory System : Optimizations for Multi-core Server Platforms
    (2007-10-25) Srinivasan, Sadagopan; Jacob, Bruce; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation investigates prefetching scheme for servers with respect to realistic memory systems. A large body of research work has been done in prefetching, even for server workloads that have sparse locality. Real systems disable prefetching in server settings, suggesting that there is a fundamental disconnect between research and practice. Our theory, a major point of this thesis, is that this disconnect is due to the use of simplistic memory models -- and our experimental results show that, among other things, using simplistic models can over-predict the system performance by up to 65%. Our investigation proceeds as follows: (In)Accuracy of Simplistic Memory Models. We demonstrate the degrees of inaccuracy of models commonly used in system design: in particular, simple models are reasonably accurate when applied to simple systems (e.g. uniprocessors), but they become increasingly inaccurate as the level of complexity of the system grows -- as cores are added, and as prefetching is added. Memory side prefetching. We then perform a detailed case study of a well known server oriented prefetch scheme -- memory-side sequential prefetch -- to develop understanding of the interaction between prefetch scheme and memory systems. In particular, we find that the projected performance gains fail to materialize due to the lack of locality in the server benchmarks and the bandwidth constraints introduced by the prefetch requests. We conclude that prefetching studies so far have been using the wrong metric to gauge idleness of the memory subsystem and consequently saturate the bus with prefetch requests. Multi-core Server Prefetching. We use our newfound understanding of prefetch and memory systems interplay to develop a novel scheme for prefetching in server platforms that does interact well with real memory systems. We find that tuning the aggressiveness of prefetching to the average memory latency, which depends on the available bandwidth, performs the best in server platforms.
  • Thumbnail Image
    Item
    Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs
    (2004-06) Iyer, Bharath; Srinivasan, Sadagopan; Jacob, Bruce
    VLIW architecture based DSPs have become widespread due to the combined benefits of simple hardware and compiler-extracted instruction-level parallelism. However, the VLIW instruction set architecture and its hardware implementation are tightly coupled, especially so for Non-Unit Assumed Latency (NUAL) VLIWs. The problem of object code compatibility across processors having different numbers of functional units or hardware latencies has been the Achilles' heel of this otherwise powerful architecture. In this paper, we propose eXtended Split-Issue (XSI), a novel mechanism that breaks the instruction packet syntax of an NUAL VLIW compiler without violating the dataflow dependences. XSI provides a designer the freedom of disassociating the hardware implementation of the NUAL VLIW processor from the instruction set architecture. Further, we investigate fairly radical (in the context of VLIW) changes to the hardware—like removing an adder, adding a multiplier, and incorporating simultaneous multithreading (SMT)—to show that our technique works for a variety of hardware configurations without compromising on performance. The technique can be used in both single-threaded and multi-threaded architectures to achieve a level of flexibility heretofore unavailable in the VLIW arena.