Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors
    (2021) Gerzhoy, Daniel; Yeung, Donald; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Heterogeneous microprocessors which integrate a CPU and GPU on a single chip provide low-overhead CPU-GPU communication and permit sharing of on-chip resources that a traditional discrete GPU would not have direct access to. These features allow for the optimization of codes that heretofore would be suitable only for multi-core CPUs or discrete GPUs to be run on a heterogeneous CPU-GPU microprocessor efficiently and in some cases- with increased performance. This thesis discusses previously published work on exploiting nested MIMD-SIMD Parallelization for Heterogeneous microprocessors. We examined loop structures in which one or more regular data parallel loops are nested within a parallel outer loop that can contain irregular code (e.g., with control divergence). By scheduling outer loops on the multicore CPU part of the microprocessor, each thread launches dynamic, independent instances of the inner loop onto the GPU, boosting GPU utilization while simultaneously parallelizing the outer loop. The second portion of the thesis proposal explores heterogeneous producer-consumer data-sharing between the CPU and GPU on the microprocessor. One advantage of tight integration -- the sharing of the on-chip cache system -- could improve the impact that memory accesses have on performance and power. Producer-consumer data sharing commonly occurs between the CPU and GPU portions of programs, but large kernel sizes whose data footprint far exceeds that of a typical CPU cache, cause shared data to be evicted before it is reused. We propose Pipelined CPU-GPU Scheduling for Caches, a locality transformation for producer-consumer relationships between CPUs and GPUs. By intelligently scheduling the execution of the producer and consumer in a software pipeline, evictions can be avoided, saving DRAM accesses, power, and performance. To keep the cached data on chip, we allow the producer to run ahead of the consumer by a certain amount of loop iterations or threads. Choosing this "run-ahead distance" becomes the main constraint in the scheduling of work in this software pipeline, and we provide a method of statically predicting it. We assert that with intelligent scheduling and the hardware and software mechanisms to support it, more workloads can be gainfully executed on integrated heterogeneous CPU-GPU microprocessors than previously assumed.
  • Thumbnail Image
    Item
    Codes with efficient erasure correction
    (2020) Chen, Zitan; Barg, Alexander; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Distributed storage systems are becoming increasingly ubiquitous in the emerging era of Internet of Things. Major internet technology companies employ large-scale distributed storage systems to accommodate the massive amounts of data generated and requested by global users. The need of reliable and efficient storage of immense amounts of data calls for new applications and development of classical error-correcting codes. This dissertation is devoted to a study of codes with efficient erasure correction for distributed storage systems. The efficiency of erasure correction is often assessed by two performance metrics, bandwidth and locality. In this dissertation we address several problems for each of these two metrics. We construct families of codes with optimal communication complexity for erasure correction ("repair bandwidth") for a heterogeneous storage model, and derive several results for the problem of optimal repair of Reed-Solomon codes. We also construct families of cyclic and convolutional codes with locality, extending the range of parameters for which such families were previously known.