Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 4 of 4
  • Thumbnail Image
    Item
    Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis
    (2015) Zhao, Minshu; Yeung,, Donald; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Directories are one key part of a processor's cache coherence hardware, and constitute one of the main bottlenecks in multicore processor scaling, e.g. core count and cache size scaling. Many research effects have tried to improve the scalability of the directory, but most of them only simulate a few architecture configurations. It is important to study the directory's architecture dependency, as the CPUs continue to scale. This is because besides applications, directory behaviors are also highly sensitive to architecture. Varying core count directly affects the amount of sharing in the directory, and varying the data cache hierarchy affects the directory access stream. But unfortunately, exploring the huge design space of multiple core counts and cache configurations is challenging using traditional architectural simulation due to the slow speed of simulations. This thesis studies the directory using multicore reuse distance analysis. It extends existing multicore reuse distance techniques, developing a method to extract directory access information from the parallel LRU stacks used to acquire private-stack reuse distance profiles. This thesis implements this method in a PIN-based profiler to study the directory behavior, including the directory access pattern and directory content, and to analyze current directory techniques. The profile results show that the directory accesses are highly dependent on cache size, exhibiting a 3.5x drop when scaling the data cache size from 16KB to 1MB; the sharing causes the ratio of directory entry to cache blocks to drop below 50%; and the majority of the accesses are to a small percentage of the directory entries. Cache simulations are performed to validate the profiling results, showing the profiled results are within 14.5% of simulation on average. This thesis also analyzes different directory techniques using the insights from the profiler. The case studies on the Cuckoo, DGD, SCD techniques and multi-level directories show that required directory size varies significantly with CPU scaling, the opportunity of compressing private data decreases with cache scaling, reducing the sharer list size is an effective technique and a small L1 directory is sufficient to capture most of the latency critical accesses respectively.
  • Thumbnail Image
    Item
    DECENTRALIZED RESOURCE ORCHESTRATION FOR HETEROGENEOUS GRIDS
    (2012) Lee, Jaehwan; Sussman, Alan; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Modern desktop machines now use multi-core CPUs to enable improved performance. However, achieving high performance on multi-core machines without optimized software support is still difficult even in a single machine, because contention for shared resources can make it hard to exploit multiple computing resources efficiently. Moreover, more diverse and heterogeneous hardware platforms (e.g. general-purpose GPU and Cell processors) have emerged and begun to impact grid computing. Given that heterogeneity and diversity are now a major trend going forward, grid computing must support these environmental changes. In this dissertation, I design and evaluate a decentralized resource management scheme to exploit heterogeneous multiple computing resources effectively. I suggest resource management algorithms that can efficiently utilize a diverse computational environment, including multiple symmetric computing entities and heterogeneous multi-computing entities, and achieve good load-balancing and high total system throughput. Moreover, I propose expressive resource description techniques to accommodate more heterogeneous environments, allowing incoming jobs with complex requirements to be matched to available resources. First, I develop decentralized resource management frameworks and job scheduling schemes to exploit multi-core nodes in peer-to-peer grids. I present two new load-balancing schemes that explicitly account for resource sharing and contention across multiple cores within a single machine, and propose a simple performance prediction model that can represent a continuum of resource sharing among cores of a CPU. Second, I provide scalable resource discovery and load balancing techniques to accommodate nodes with many types of computing elements, such as multi-core CPUs and GPUs, in a peer-to-peer grid architecture. My scheme takes into account diverse aspects of heterogeneous nodes to maximize overall system throughput as well as minimize messaging costs without sacrificing the failure resilience provided by an underlying peer-to-peer overlay network. Finally, I propose an expressive resource discovery method to support multi-attribute, range-based job constraints. The common approach of using simple attribute indexes does not suffice, as range-based constraints may be satisfied by more than a single value. I design a compact ID-based representation for resource characteristics, and integrate this representation into the decentralized resource discovery framework. By extensive experimental results via simulation, I show that my schemes can match heterogeneous jobs to heterogeneous resources both effectively (good matches are found, load is balanced), and efficiently (the new functionality imposes little overhead).
  • Thumbnail Image
    Item
    New Algorithmic Techniques for Large Scale Volumetric Data Visualization on Parallel Architectures
    (2008-07-16) Wang, Qin; JaJa, Joseph; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Volume visualization is widely used as an effective approach for the visual exploration, computational analysis, and manipulation of volumetric datasets. Due to the dramatic advances in imaging instruments and computing technologies, such datasets are now appearing at a very fast rate with increasingly larger sizes in many engineering, science and medical applications. Isosurface and direct volume rendering(DVR) are two of the most widely used techniques to render such datasets. This dissertation introduces novel techniques for rendering isosurfaces and volumes, and extends these techniques to multiprocessor architectures. We first focus on cluster-based techniques for isosurface extraction and rendering using polygonal approximation. We present a new simple indexing scheme and data layout approach, which enable scalable and efficient isosurface generation. This algorithm is the first known parallel algorithm to achieve provable load balancing on multiprocessor systems. We also develop an algorithm to generate isosurfaces using ray-casting on multi-core processors. Our method is based on a hybrid strategy that begins with an object order traversal of the data followed by ray-casting on ordered sets of an adaptive number of subcubes, one set for each small group of pixels on the image. We develop a multithreaded implementation, which uses new dynamic load balancing techniques that start with an image partitioning for the initial stage and then perform dynamic allocation of groups of ray-casting tasks among the different threads. The strategy ensures almost equal loads among the cores while maintaining spatial data locality. This scheme is extended to perform direct volume rendering and is shown to achieve similar improvements in terms of overall performance, load balancing, and scalability. We conduct a large number of tests for all our algorithms on the University of Maryland Visualization Cluster and on the 8-core Clovertown platform using a wide variety of datasets such as Richtmyer-Meshkov Instability dataset (7.5GB for each time step) and Visible Human dataset (~1GB). We obtain results that consistently validate the efficiency and the scalability of our algorithms. In particular, the overall performance of our hybrid ray-casting scheme achieves an interactive rendering rate on high resolution (1024x1024) screens for all the datasets tested.
  • Thumbnail Image
    Item
    Efficient rendering of large 3-D and 4-D scalar fields
    (2008-05-09) Kim, Jusub; JaJa, Joseph; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Rendering volumetric data, as a compute/communication intensive and highly parallel application, represents the characteristics of future workloads for desktop computers. Interactively rendering volumetric data has been a challenging problem due to its high computational and communication requirements. With the consistent trend toward high resolution data, it has remained a difficult problem despite the continuous increase in processing power, because of the increasing performance gap between computation and communication. On the other hand, the new multi-core architecture trend in computational units in PC, which can be characterized by parallelism and heterogeneity, provides both opportunities and challenges. While the new on-chip parallel architectures offer opportunities for extremely high performance, widespread use of those parallel processors requires extensive changes in previous algorithms to take advantage of the new architectures. In this dissertation, we develop new methods and techniques to support interactive rendering of large volumetric data. In particular, we present a novel method to layout data on disk for efficiently performing an out-of-core axis-aligned slicing of large multidimensional scalar fields. We also present a new method to efficiently build an out-of-core indexing structure for n-dimensional volumetric data. Then, we describe a streaming model for efficiently implementing volume ray casting on a heterogeneous compute resource environment. We describe how we implement the model on SONY/TOSHIBA/IBM Cell Broadband Engine and on NVIDIA CUDA architecture. Our results show that our out-of-core techniques significantly reduce the communication bandwidth requirements and that our streaming model very effectively makes use of the strengths of those heterogeneous parallel compute resource environment for volume rendering. In all cases, we achieve scalability and load balancing, while hiding memory latency.