Electrical & Computer Engineering Research Works
Permanent URI for this collectionhttp://hdl.handle.net/1903/1658
Browse
33 results
Search Results
Item Composing With Genetic Algorithms(International Computer Music Association, 1995-09) Jacob, BrucePresented is an application of genetic algorithms to the problem of composing music, in which GAs are used to produce a set of data filters that identify acceptable material from the output of a stochastic music generator. The algorithmic composition system variations is described and musical examples of its output are given. Also discussed briefly is the system’s application to microtonal music.Item Hardware/Software Co-Design of I/O Interfacing Hardware and Real-Time Device Drivers for Embedded Systems(1999-06) Stewart, David B.; Jacob, BruceWe have conceptualized a hardware-software codesign strategy for creating I/O interfacing hardware and real-time operating system device drivers for microcontrollers, enabling hardware independent access to I/O devices at near-zero overhead. We achieve this low overhead through the addition of a hardware mechanism to the microcontroller architecture that we call nanoprocessors. The architecture extensions are orthogonal to the underlying microarchitecture and can be implemented inexpensively, and are thus suitable for use in low-cost microcontrollers. Our current research is to validate this concept through extensive testing on a simulated processor, and to measure the cost-effectiveness of the hardware architecture extensions over a wide range of design choices.Item Software-Managed Address Translation(1997-02) Jacob, Bruce; Mudge, TrevorIn this paper we explore software-managed address translation. The purpose of the study is to specify the memory management design for a high clock-rate PowerPC implementation in which a simple design is a prerequisite for a fast clock and a short design cycle. We show that software-managed address translation is just as efficient as hardware- managed address translation, and it is much more flexible. Operating systems such as OSF/1 and Mach charge between 0.10 and 0.28 cycles per instruction (CPI) for address translation using dedicated memory-management hardware. Software-managed translation requires 0.05 CPI. Mechanisms to support such features as shared memory, superpages, sub-page protection, and sparse address spaces can be defined completely in software, allowing much more flexibility than in hardware-defined mechanisms.Item Virtual Memory: Issues of Implementation(IEEE Computer, 1998-06) Jacob, Bruce; Mudge, TrevorThe authors introduce basic virtual-memory technologies and then compare memory-management designs in three commercial microarchitectures. They show the diversity of virtual-memory support and, by implication, how this diversity can complicate and compromise system operations.Item Virtual Memory in Contemporary Microprocessors(IEEE, 1998) Jacob, Bruce; Mudge, TrevorTHIS SURVEY OF SIX COMMERCIAL MEMORY-MANAGEMENT DESIGNS DESCRIBES HOW EACH PROCESSOR ARCHITECTURE SUPPORTS THE COMMON FEATURES OF VIRTUAL MEMORY: ADDRESS SPACE PROTECTION, SHARED MEMORY, AND LARGE ADDRESS SPACES.Item A Performance Comparison of Contemporary DRAM Architectures(1999-05) Cuppu, Vinodh; Jacob, Bruce; Davis, Brian; Mudge, TrevorIn response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-based performance study of a representative group, each evaluated in a small system organization. These small-system organizations correspond to workstation-class computers and use on the order of 10 DRAM chips. The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Synchronous Link, Rambus, and Direct Rambus designs. Our simulations reveal several things: (a) current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; (b) bus transmission speed will soon become a primary factor limiting memory-system performance; (c) the post-L2 address stream still contains significant locality, though it varies from application to application; and (d) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality to decrease access time.Item DDR2 and Low Latency Variants(2000-07) Davis, Brian; Mudge, Trevor; Jacob, Bruce; Cuppu, VinodhThis paper describes a performance examination of the DDR2 DRAM architecture and the proposed cache-enhanced variants. These preliminary studies are based upon ongoing collaboration between the authors and the Joint Electronic Device Engineering Council (JEDEC) Low Latency DRAM Working Group, a working group within the JEDEC 42.3 Future DRAM Task Group. This Task Group is responsible for developing the DDR2 standard. The goal of the Low Latency DRAM Working Group is the creation of a single cache-enhanced (i.e. low-latency) architecture based upon this same interface. There are a number of proposals for reducing the average access time of DRAM devices, most of which involve the addition of SRAM to the DRAM device. As DDR2 is viewed as a future standard, these proposals are frequently applied to a DDR2 interface device. For the same reasons it is advantageous to have a single DDR2 specification, it is similarly beneficial to have a single low-latency specification. The authors are involved in ongoing research to evaluate which enhancements to the baseline DDR2 devices will yield lower average latency, and for what type of applications. To provide context, experimental results will be compared against those for systems utilizing PC100 SDRAM, DDR133 SDRAM, and Direct Rambus (DRDRAM). This work is just starting to produce performance data. Initial results show performance improvements for low-latency devices that are significant, but less so than a generational change in DRAM interface. It is also apparent that there are at least two classifications of applications: 1) those that saturate the memory bus, for which performance is dependent upon the potential bandwidth and bus utilization of the system; and 2) those that do not contain the access parallelism to fully utilize the memory bus, and for which performance is dependent upon the latency of the average primary memory access.Item Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs(2004-06) Iyer, Bharath; Srinivasan, Sadagopan; Jacob, BruceVLIW architecture based DSPs have become widespread due to the combined benefits of simple hardware and compiler-extracted instruction-level parallelism. However, the VLIW instruction set architecture and its hardware implementation are tightly coupled, especially so for Non-Unit Assumed Latency (NUAL) VLIWs. The problem of object code compatibility across processors having different numbers of functional units or hardware latencies has been the Achilles' heel of this otherwise powerful architecture. In this paper, we propose eXtended Split-Issue (XSI), a novel mechanism that breaks the instruction packet syntax of an NUAL VLIW compiler without violating the dataflow dependences. XSI provides a designer the freedom of disassociating the hardware implementation of the NUAL VLIW processor from the instruction set architecture. Further, we investigate fairly radical (in the context of VLIW) changes to the hardware—like removing an adder, adding a multiplier, and incorporating simultaneous multithreading (SMT)—to show that our technique works for a variety of hardware configurations without compromising on performance. The technique can be used in both single-threaded and multi-threaded architectures to achieve a level of flexibility heretofore unavailable in the VLIW arena.Item Uniprocessor Virtual Memory Without TLBs(IEEE, 2001-05) Jacob, Bruce; Mudge, TrevorWe present a feasibility study for performing virtual address translation without specialized translation hardware. Removing address translation hardware and instead managing address translation in software has the potential to make the processor design simpler, smaller, and more energy-efficient at little or no cost in performance. The purpose of this study is to describe the design and quantify its performance impact. Trace-driven simulations show that software-managed address translation is just as efficient as hardware-managed address translation. Moreover, mechanisms to support such features such as shared memory, superpages, fine-grained protection, and sparse address spaces can be defined completely in software, allowing for more flexibility than in hardware-defined mechanisms.Item Instruction-Level Power Dissipation in the Intel XScale Embedded Microprocessor(2005-01) Varma, Ankush; Debes, Eric; Kozintsev, Igor; Jacob, BruceWe present an instruction-level power dissipation model of the Intel XScale R° microprocessor. The XScale implements the ARMTMISA, but uses an aggressive microarchitecture and a SIMD Wireless MMXTMco-processor to speed up execution of multimedia workloads in the embedded domain. Instruction-Level power modelling was ¯rst proposed by Tiwari et. al. in 1994. Adaptations of this model have been found to be applicable to simple ARM processors. Research also shows that instructions can be clustered into groups with similar energy characteristics. We adapt these methodologies to the significantly more complex XScale processor. We characterize the processor in terms of the energy costs of opcode execution, operand values, pipeline stalls etc. through accurate measurements on hardware. This instruction-based (rather than microarchitectural) approach allows us to build a high-speed power-accurate simulator that runs at MIPS-range speeds, while achieving accuracy better than 5%. The processor core accounts only for a portion of overall power consumption, and we move beyond the core to explore the issues involved in building a SystemC simulation framework that models power dissipation of complete systems quickly, flexibly and accurately.