Technical Reports from UMIACS
http://hdl.handle.net/1903/7
2020-07-05T08:30:40ZDesign and Evaluation of Monolithic Computers Implemented Using Crossbar ReRAM
http://hdl.handle.net/1903/22243
Design and Evaluation of Monolithic Computers Implemented Using Crossbar ReRAM
Jagasivamani, Meenatchi; Walden, Candace; Singh, Devesh; Li, Shang; Kang, Luyi; Asnaashari, Mehdi; Dubois, Sylvain; Jacob, Bruce; Yeung, Donald
A monolithic computer is an emerging architecture in which a multicore CPU and a high-capacity main memory system are all integrated in a single die. We believe such architectures will be possible in the near future due to nonvolatile memory technology, such as the resistive random access memory, or ReRAM, from Crossbar Incorporated. Crossbar's ReRAM can be fabricated in a standard CMOS logic process, allowing it to be integrated into a CPU's die. The ReRAM cells are manufactured in between metal wires and do not employ per-cell access transistors, leaving the bulk of the base silicon area vacant. This means that a CPU can be monolithically integrated directly underneath the ReRAM memory, allowing the cores to have massively parallel access to the main memory.
This paper presents the characteristics of Crossbar's ReRAM technology, informing architects on how ReRAM can enable monolithic computers. Then, it develops a CPU and memory system architecture around those characteristics, especially to exploit the unprecedented memory-level parallelism. The architecture employs a tiled CPU, and incorporates memory controllers into every compute tile that support a variable access granularity to enable high scalability. Lastly, the paper conducts an experimental evaluation of monolithic computers on graph kernels and streaming computations. Our results show that compared to a DRAM-based tiled CPU, a monolithic computer achieves 4.7x higher performance on the graph kernels, and achieves roughly parity on the streaming computations. Given a future 7nm technology node, a monolithic computer could outperform the conventional system by 66% for the streaming computations.
2019-07-16T00:00:00ZExploiting Multi-Loop Parallelism on Heterogeneous Microprocessors
http://hdl.handle.net/1903/18886
Exploiting Multi-Loop Parallelism on Heterogeneous Microprocessors
Zuzak, Michael; Yeung, Donald
Heterogeneous microprocessors integrate CPUs and GPUs on the same chip,
providing fast CPU-GPU communication and enabling cores to compute on
data "in place." These advantages will permit integrated GPUs to exploit
a smaller unit of parallelism. But one challenge will be exposing
sufficient parallelism to keep all of the on-chip compute resources
fully utilized. In this paper, we argue that integrated CPU-GPU chips
should exploit parallelism from multiple loops simultaneously. One
example of this is nested parallelism in which one or more inner SIMD
loops are nested underneath a parallel outer (non- SIMD) loop. By
scheduling the parallel outer loop on multiple CPU cores, multiple
dynamic instances of the inner SIMD loops can be scheduled on the GPU
cores. This boosts GPU utilization and parallelizes the non-SIMD code.
Our preliminary results show exploiting such multi-loop parallelism
provides a 3.12x performance gain over exploiting parallelism from
individual loops one at a time.
2016-11-10T00:00:00ZBody Maps on Human Chromosomes
http://hdl.handle.net/1903/17177
Body Maps on Human Chromosomes
Cherniak, Christopher; Rodriguez-Esteban, Raul
An exploration of the hypothesis that human genes are organized somatotopically: For each autosomal chromosome, its tissue-specific genes tend to have relative positions on the chromosome that mirror corresponding positions of the tissues in the body. In addition, there appears to be a division of labor: Such a homunculus representation on a chromosome holds significantly for either the anteroposterior or the dorsoventral body axis. In turn, anteroposterior and dorsoventral chromosomes tend to occupy separate zones in the spermcell nucleus. One functional rationale of such largescale organization is for efficient interconnections in the genome.
2015-11-08T00:00:00ZAccurate computation of Galerkin double surface integrals in the 3-D boundary element method
http://hdl.handle.net/1903/16394
Accurate computation of Galerkin double surface integrals in the 3-D boundary element method
Adelman, Ross; Gumerov, Nail A.; Duraiswami, Ramani
Many boundary element integral equation kernels are based on the Green’s functions of the Laplace and Helmholtz equations in three dimensions. These include, for example, the Laplace, Helmholtz, elasticity, Stokes, and Maxwell equations. Integral equation formulations lead to more compact, but dense linear systems. These dense systems are often solved iteratively via Krylov subspace methods, which may be accelerated via the fast multipole method. There are advantages to Galerkin formulations for such integral equations, as they treat problems associated with kernel singularity, and lead to symmetric and better conditioned matrices. However, the Galerkin method requires each entry in the system matrix to be created via the computation of a double surface integral over one or more pairs of triangles. There are a number of semi-analytical methods to treat these integrals, which all have some issues, and are discussed in this paper. We present novel methods to compute all the integrals that arise in Galerkin formulations involving kernels based on the Laplace and Helmholtz Green’s functions to any specified accuracy. Integrals involving completely geometrically separated triangles are non-singular and are computed using a technique based on spherical harmonics and multipole expansions and translations, which results in the integration of polynomial functions over the triangles.
Integrals involving cases where the triangles have common vertices, edges, or are coincident are treated via scaling and symmetry arguments, combined with automatic recursive geometric decomposition of the integrals. Example results are presented, and the developed software is available as open source.
2015-05-29T00:00:00Z