Tech Reports in Computer Science and Engineering
Permanent URI for this communityhttp://hdl.handle.net/1903/5
The technical reports collections in this community are deposited by the Library of the Computer Science department. If you have questions about these collections, please contact library staff at library@cs.umd.edu
Browse
3 results
Search Results
Item Pipelined CPU-GPU Scheduling for Caches(2021-03-23) Gerzhoy, Daniel; Yeung, DonaldHeterogeneous microprocessors integrate a CPU and GPU with a shared cache hierarchy on the same chip, affording low-overhead communication between the CPU and GPU's cores. Often times, large array data structures are communicated from the CPU to the GPU and back. While the on-chip cache hierarchy can support such CPU-GPU producer-consumer sharing, this almost never happens due to poor temporal reuse. Because the data structures can be quite large, by the time the consumer reads the data, it has been evicted from cache even though the producer had brought it on-chip when it originally wrote the data. As a result, the CPU-GPU communication happens through main memory instead of the cache, hurting performance and energy. This paper exploits the on-chip caches in a heterogeneous microprocessor to improve CPU-GPU communication efficiency. We divide streaming computations executed by the CPU and GPU that exhibit producer-consumer sharing into chunks, and overlap the execution of CPU chunks with GPU chunks in a software pipeline. To enforce data dependences, the producer executes one chunk ahead of the consumer at all times. We also propose a low-overhead synchronization mechanism in which the CPU directly controls thread-block scheduling in the GPU to maintain the producer's "run-ahead distance" relative to the consumer. By adjusting the chunk size or run-ahead distance, we can make the CPU-GPU working set fit in the last-level cache, thus permitting the producer-consumer sharing to occur through the LLC. We show through simulation that our technique reduces the number of DRAM accesses by 30.4%, improves performance by 26.8%, and lowers memory system energy by 27.4% averaged across 7 benchmarks.Item Nervous system maps on the C. elegans genome(2020-09-28) Cherniak, Christopher; Mokhtarzada, Zekeria; Rodriguez-Esteban, RaulThis project begins from a synoptic point of view, focusing upon the large-scale (global) landscape of the genome. This is along the lines of combinatorial network optimization in computational complexity theory [1]. Our research program here in turn originated along parallel lines in computational neuroanatomy [2,3,4,5]. Rather than mapping body structure onto the genome, the present report focuses upon statistically significant mappings of the Caenorhabditis elegans nervous system onto its genome. Via published datasets, evidence is derived for a "wormunculus", on the model of a homunculus representation, but on the C. elegans genome. The main method of testing somatic-genomic position-correlations here is via public genome databases, with r^2 analyses and p evaluations. These findings appear to yield some of the basic structural and functional organization of invertebrate nucleus and chromosome architecture. The design rationale for somatic maps on the genome in turn may be efficient interconnections. A next question this study raises: How do these various somatic maps mesh (interrelate, interact) with each other?Item Boundary Element Solution of Electromagnetic Fields for Non-Perfect Conductors at Low Frequencies and Thin Skin Depths(2020-05-13) Gumerov, Nail A.; Adelman, Ross N.; Duraiswami, RamaniA novel boundary element formulation for solving problems involving eddy currents in the thin skin depth approximation is developed. It is assumed that the time-harmonic magnetic field outside the scatterers can be described using the quasistatic approximation. A two-term asymptotic expansion with respect to a small parameter characterizing the skin depth is derived for the magnetic and electric fields outside and inside the scatterer, which can be extended to higher order terms if needed. The introduction of a special surface operator (the inverse surface gradient) allows the reduction of the problem complexity. A method to compute this operator is developed. The obtained formulation operates only with scalar quantities and requires computation of surface operators that are usual for boundary element (method of moments) solutions to the Laplace equation. The formulation can be accelerated using the fast multipole method. The method is much faster than solving the vector Maxwell equations. The obtained solutions are compared with the Mie solution for scattering from a sphere and the error of the solution is studied. Computations for much more complex shapes of different topologies, including for magnetic and electric field cages used in testing are also performed and discussed.