Electrical & Computer Engineering Research Works
Permanent URI for this collectionhttp://hdl.handle.net/1903/1658
Browse
8 results
Search Results
Item Software-Managed Address Translation(1997-02) Jacob, Bruce; Mudge, TrevorIn this paper we explore software-managed address translation. The purpose of the study is to specify the memory management design for a high clock-rate PowerPC implementation in which a simple design is a prerequisite for a fast clock and a short design cycle. We show that software-managed address translation is just as efficient as hardware- managed address translation, and it is much more flexible. Operating systems such as OSF/1 and Mach charge between 0.10 and 0.28 cycles per instruction (CPI) for address translation using dedicated memory-management hardware. Software-managed translation requires 0.05 CPI. Mechanisms to support such features as shared memory, superpages, sub-page protection, and sparse address spaces can be defined completely in software, allowing much more flexibility than in hardware-defined mechanisms.Item Virtual Memory: Issues of Implementation(IEEE Computer, 1998-06) Jacob, Bruce; Mudge, TrevorThe authors introduce basic virtual-memory technologies and then compare memory-management designs in three commercial microarchitectures. They show the diversity of virtual-memory support and, by implication, how this diversity can complicate and compromise system operations.Item Virtual Memory in Contemporary Microprocessors(IEEE, 1998) Jacob, Bruce; Mudge, TrevorTHIS SURVEY OF SIX COMMERCIAL MEMORY-MANAGEMENT DESIGNS DESCRIBES HOW EACH PROCESSOR ARCHITECTURE SUPPORTS THE COMMON FEATURES OF VIRTUAL MEMORY: ADDRESS SPACE PROTECTION, SHARED MEMORY, AND LARGE ADDRESS SPACES.Item A Performance Comparison of Contemporary DRAM Architectures(1999-05) Cuppu, Vinodh; Jacob, Bruce; Davis, Brian; Mudge, TrevorIn response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-based performance study of a representative group, each evaluated in a small system organization. These small-system organizations correspond to workstation-class computers and use on the order of 10 DRAM chips. The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Synchronous Link, Rambus, and Direct Rambus designs. Our simulations reveal several things: (a) current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; (b) bus transmission speed will soon become a primary factor limiting memory-system performance; (c) the post-L2 address stream still contains significant locality, though it varies from application to application; and (d) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality to decrease access time.Item DDR2 and Low Latency Variants(2000-07) Davis, Brian; Mudge, Trevor; Jacob, Bruce; Cuppu, VinodhThis paper describes a performance examination of the DDR2 DRAM architecture and the proposed cache-enhanced variants. These preliminary studies are based upon ongoing collaboration between the authors and the Joint Electronic Device Engineering Council (JEDEC) Low Latency DRAM Working Group, a working group within the JEDEC 42.3 Future DRAM Task Group. This Task Group is responsible for developing the DDR2 standard. The goal of the Low Latency DRAM Working Group is the creation of a single cache-enhanced (i.e. low-latency) architecture based upon this same interface. There are a number of proposals for reducing the average access time of DRAM devices, most of which involve the addition of SRAM to the DRAM device. As DDR2 is viewed as a future standard, these proposals are frequently applied to a DDR2 interface device. For the same reasons it is advantageous to have a single DDR2 specification, it is similarly beneficial to have a single low-latency specification. The authors are involved in ongoing research to evaluate which enhancements to the baseline DDR2 devices will yield lower average latency, and for what type of applications. To provide context, experimental results will be compared against those for systems utilizing PC100 SDRAM, DDR133 SDRAM, and Direct Rambus (DRDRAM). This work is just starting to produce performance data. Initial results show performance improvements for low-latency devices that are significant, but less so than a generational change in DRAM interface. It is also apparent that there are at least two classifications of applications: 1) those that saturate the memory bus, for which performance is dependent upon the potential bandwidth and bus utilization of the system; and 2) those that do not contain the access parallelism to fully utilize the memory bus, and for which performance is dependent upon the latency of the average primary memory access.Item Uniprocessor Virtual Memory Without TLBs(IEEE, 2001-05) Jacob, Bruce; Mudge, TrevorWe present a feasibility study for performing virtual address translation without specialized translation hardware. Removing address translation hardware and instead managing address translation in software has the potential to make the processor design simpler, smaller, and more energy-efficient at little or no cost in performance. The purpose of this study is to describe the design and quantify its performance impact. Trace-driven simulations show that software-managed address translation is just as efficient as hardware-managed address translation. Moreover, mechanisms to support such features such as shared memory, superpages, fine-grained protection, and sparse address spaces can be defined completely in software, allowing for more flexibility than in hardware-defined mechanisms.Item High-Performance DRAMs in Workstation Environments(2001-10) Cuppu, Vinodh; Jacob, Bruce; Davis, Brian; Mudge, TrevorThis paper presents a simulation-based performance study of several of the new high-performance DRAM architectures, each evaluated in a small system organization. These small-system organizations correspond to workstation-class computers and use only a handful of DRAM chips (~10, as opposed to ~1 or ~100). The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Double Data Rate, Synchronous Link, Rambus, and Direct Rambus designs. Our simulations reveal several things: (a) current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; (b) bus transmission speed will soon become a primary factor limiting memory-system performance; (c) the post-L2 address stream still contains significant locality, though it varies from application to application; (d) systems without L2 caches are feasible for low- and medium-speed CPUs (1GHz and below); and (e) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality to decrease access time.Item The Trading Function in Action(ACM (Association for Computing Machinery) Publications, 1996-09) Jacob, Bruce; Mudge, TrevorThis paper describes a commercial software and hardware platform for telecommunications and multimedia processing. The software architecture loosely follows the CORBA and ODP standards of distributed computing and supports a number of application types on different hardware configurations. This paper is the result of lessons learned in the process of designing, building, and modifying an industrial telecommunications platform. In particular, the use of the trading function in the design of the system led to such benefits as support for the dynamic evolution of the system, the ability to dynamically add services and data types to a running system, support for heterogeneous systems, and a simple design performing well enough to handle traffic in excess of 40,000 busy-hour calls.