Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    Hybrid-PGAS Memory Hierarchy for Next Generation HPC Systems
    (2024) Johnson, Richard Bradford; Hollingsworth, Jeffrey K; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Demands on computational performance, power efficiency, data transfer, resource capacity, and resilience for next generation high performance computing (HPC) systems present a new host of challenges. There is a growing disparity between computational performance vs. network and storage device throughput and among the energy costs of computational, memory, and communication operations. Chapel is a powerful, high-level, parallel, PGAS language designed to streamline development by addressing code complexities and uses a shared memory model for handling large, distributed memory systems. I extended the capabilities of Chapel by providing support of persistent memory with intrinsic and programmatic features for HPC systems. In my approach I explored the efficacy of persistent memory in a hybrid-PGAS environment through latency hiding analysis via cache monitoring, identification and mitigation of performance bottlenecks via data-centric analysis, and hardware profiling to assess performance cost vs. benefits and energy footprint. To manage persistency and ensure resiliency I developed a transaction system with ACID properties that supports hybrid-PGAS virtual addressing and distributed checkpoint and recovery system.
  • Thumbnail Image
    Item
    Methods and Tools for Real-Time Neural Image Processing
    (2023) Xie, Jing; Bhattacharyya, Shuvra; Chen, Rong; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    As a rapidly developing form of bioengineering technology, neuromodulationsystems involve extracting information from signals that are acquired from the brain and utilizing the information to stimulate brain activity. Neuromodulation has the potential to treat a wide range of neurological diseases and psychiatric conditions, as well as the potential to improve cognitive function. Neuromodulation integrates neural decoding and stimulation. As one of the twocore parts of neuromodulation systems, neural decoding subsystems interpret signals acquired through neuroimaging devices. Neuroimaging is a field of neuroscience that uses imaging techniques to study the structure and function of the brain and other central nervous system functions. Extracting information from neuroimaging signals, as is required in neural decoding, involves key challenges due to requirements of real-time, energy-efficient, and accurate processing and for large-scale, high resolution image data that are characteristic of neuromodulation systems. To address these challenges, we develop new methods and tools for design andimplementation of efficient neural image processing systems. Our contributions are organized along three complementary directions. First, we develop a prototype system for real-time neuron detection and activity extraction called the Neuron Detection and Signal Extraction Platform (NDSEP). This highly configurable system processes neural images from video streams in real-time or off-line, and applies techniques of dataflow modeling to enable extensibility and experimentation with a wide variety of image processing algorithms. Second,we develop a parameter optimization framework to tune the performance of neural image processing systems. This framework, referred to as the NEural DEcoding COnfiguration (NEDECO) package, automatically optimizes arbitrary collections of parameters in neural image processing systems under customizable constraints. The framework allows system designers to explore alternative neural image processing trade-offs involving execution time and accuracy. NEDECO is also optimized for efficient operation on multicore platforms, which allows for faster execution of the parameter optimization process. Third, we develop a neural network inference engine targeted to mobile devices.The framework can be applied to neural network implementation in many application areas, including neural image processing. The inference engine, called ShaderNN, is the first neural network inference engine that exploits both graphics-centric abstractions (fragment shaders) and compute-centric abstractions (compute shaders). The integration of fragment shaders and compute shaders makes improved use of the parallel computing advantages of GPUs on mobile devices. ShaderNN has favorable performance especially in parametrically small models.
  • Thumbnail Image
    Item
    HIGH PERFORMANCE AGENT-BASED MODELS WITH REAL-TIME IN SITU VISUALIZATION OF INFLAMMATORY AND HEALING RESPONSES IN INJURED VOCAL FOLDS
    (2019) Seekhao, Nuttiiya; JaJa, Joseph; Li-Jessen, Nicole Y. K.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The introduction of clusters of multi-core and many-core processors has played a major role in recent advances in tackling a wide range of new challenging applications and in enabling new frontiers in BigData. However, as the computing power increases, the programming complexity to take optimal advantage of the machine's resources has significantly increased. High-performance computing (HPC) techniques are crucial in realizing the full potential of parallel computing. This research is an interdisciplinary effort focusing on two major directions. The first involves the introduction of HPC techniques to substantially improve the performance of complex biological agent-based models (ABM) simulations, more specifically simulations that are related to the inflammatory and healing responses of vocal folds at the physiological scale in mammals. The second direction involves improvements and extensions of the existing state-of-the-art vocal fold repair models. These improvements and extensions include comprehensive visualization of large data sets generated by the model and a significant increase in user-simulation interactivity. We developed a highly-interactive remote simulation and visualization framework for vocal fold (VF) agent-based modeling (ABM). The 3D VF ABM was verified through comparisons with empirical vocal fold data. Representative trends of biomarker predictions in surgically injured vocal folds were observed. The physiologically representative human VF ABM consisted of more than 15 million mobile biological cells. The model maintained and generated 1.7 billion signaling and extracellular matrix (ECM) protein data points in each iteration. The VF ABM employed HPC techniques to optimize its performance by concurrently utilizing the power of multi-core CPU and multiple GPUs. The optimization techniques included the minimization of data transfer between the CPU host and the rendering GPU. These transfer minimization techniques also reduced transfers between peer GPUs in multi-GPU setups. The data transfer minimization techniques were executed with a scheduling scheme that aims to achieve load balancing, maximum overlap of computation and communication, and a high degree of interactivity. This scheduling scheme achieved optimal interactivity by hyper-tasking the available GPUs (GHT). In comparison to the original serial implementation on a popular ABM framework, NetLogo, these schemes have shown substantial performance improvements of 400x and 800x for the 2D and 3D model, respectively. Furthermore, the combination of data footprint and data transfer reduction techniques with GHT achieved high-interactivity visualization with an average framerate of 42.8 fps. This performance enabled the users to perform real-time data exploration on large simulated outputs and steer the course of their simulation as needed.
  • Thumbnail Image
    Item
    Multi-sensor Cloud and Aerosol Retrieval Simulator and Its Applications
    (2016) Wind, Galina; Salawitch, Ross J; Platnick, Steven; Atmospheric and Oceanic Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Executing a cloud or aerosol physical properties retrieval algorithm from controlled synthetic data is an important step in retrieval algorithm development. Synthetic data can help answer questions about the sensitivity and performance of the algorithm or aid in determining how an existing retrieval algorithm may perform with a planned sensor. Synthetic data can also help in solving issues that may have surfaced in the retrieval results. Synthetic data become very important when other validation methods, such as field campaigns,are of limited scope. These tend to be of relatively short duration and often are costly. Ground stations have limited spatial coverage whilesynthetic data can cover large spatial and temporal scales and a wide variety of conditions at a low cost. In this work I develop an advanced cloud and aerosol retrieval simulator for the MODIS instrument, also known as Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS). In a close collaboration with the modeling community I have seamlessly combined the GEOS-5 global climate model with the DISORT radiative transfer code, widely used by the remote sensing community, with the observations from the MODIS instrument to create the simulator. With the MCARS simulator it was then possible to solve the long standing issue with the MODIS aerosol optical depth retrievals that had a low bias for smoke aerosols. MODIS aerosol retrieval did not account for effects of humidity on smoke aerosols. The MCARS simulator also revealed an issue that has not been recognized previously, namely,the value of fine mode fraction could create a linear dependence between retrieved aerosol optical depth and land surface reflectance. MCARS provided the ability to examine aerosol retrievals against “ground truth” for hundreds of thousands of simultaneous samples for an area covered by only three AERONET ground stations. Findings from MCARS are already being used to improve the performance of operational MODIS aerosol properties retrieval algorithms. The modeling community will use the MCARS data to create new parameterizations for aerosol properties as a function of properties of the atmospheric column and gain the ability to correct any assimilated retrieval data that may display similar dependencies in comparisons with ground measurements.
  • Thumbnail Image
    Item
    investigating the effects of HPC novice programmer variations on code performance
    (2007-12-07) Alameh, Rola; Basili, Victor R; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we quantitatively study the effect of High Performance Computing (HPC) novice programmer variations in effort on the performance of the code produced. We look at effort variations from three different perspectives: total effort spent, daily distribution of effort, and the distribution of effort over coding and debugging activities. The relationships are studied in the context of classroom studies. A qualitative study of both effort and performance of students was necessary in order to distinguish regular patterns and define metrics suitable for the student environment and goals. Our results suggest that total effort does not correlate with performance, and that effort spent coding does not count more than effort spent debugging towards performance. In addition, we were successful in identifying a daily distribution pattern of effort which correlates with performance, suggesting that subjects who distribute their workload uniformly across days, pace themselves, and minimize interruptions achieve better performance.