Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 5 of 5

Simulating Bursty and Continuous Reionization Using GPU Computing
(2023) Hartley, Blake Teixeira; Ricotti, Massimo; Astronomy; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Reionization is the process by which the neutral intergallactic medium of the early universe was ionized by the first galaxies, and took place somewhere between roughly redshift 30 and redshift 6, or from 100~Myr into the universe to 1~Gyr. The details of this transition are still not well understood, but observational constraints suggest that reionization happened faster than naive estimates would suggest. In this thesis, we investigate the theory that galaxies which form their stars in short bursts could complete reionization faster than galaxies which emit their photons continuously over their lifespans. We began investigating this theory with a semi-analytic model of the early universe. We used analytic methods to model the expansion of \HII (ionized hydrogen) regions around isolated galaxies, as well as the behavior of the remnant \HII regions after star formation ceases. We then compiled assortments of galaxies matching dark matter simulation profiles and associated each with an \HII region that could either grow continuously or grow quickly before entering a dormant period of recombination. These tests indicated that the remnants of bursty star formation had lower overall recombination rates than those of continuously expanding \HII regions, and that these remnants could allow for ionizing radiation from more distant sources to influence ionization earlier. We decided that the next step towards demonstrating the differences between continuous and bursty star formation would require the use of a more accurate model of the early universe. We chose a photon conserving ray tracing algorithm which follows the path of millions of rays from each galaxy and calculates the ionization rate at every point in a uniform 3D grid. The massive amount of computation required for such an algorithm led us to choose MPI as the framework for building our simulation. MPI allowed us to break the grid into 8 sub-volumes, each of which could be assigned to a node on a supercomputer. We then used CUDA to track the millions of rays, with each of the thousands of CUDA cores handling a single ray. Creating my own simulation library would afford us complete control over the distribution and time dependence of ionizing radiation emission, which is critical to isolating the effect of bursty star formation on reionization. Once we had completed, we conducted a suite of simulations across a selection of model parameters using this library. Every set of model parameters we selected corresponds to two models, one continuous and one bursty. This selection allowed us to isolate the effect of bursty star formation on the results of the simulations. We found that the effects we hoped to see were present in our simulations, and obtained simple estimates of the size of these effects.
A Time Parallel Approach to Numerical Simulation of Asymptotically Stable Dynamical Systems with Application to CFD Models of Helicopter Rotors
(2023) Silbaugh, Benjamin Scott; Baeder, James D; Aerospace Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Modern High Performance Computing (HPC) machines are distributed memoryclusters, consisting of multi-core compute nodes. Engineering simulation and analysis tools must employ efficient parallel algorithms in order to fully utilize the compute capability of modern HPC machines. The trend in Computational Fluid Dynamcis (CFD) has been to construct parallel solution algorithms based on some form of spatial domain decomposition. This approach has been shown to be a success for many practical applications. However, as one attempts to utlize more compute cores, limitations in strong scalability are inevitably reached due to a diminishing compute workload per compute core and either fixed or increasing communication cost. Furthermore, spatial domain decomposition approaches cannot be easily applied to mid-fidelity structural dynamics or rigid body dynamics models. A significant majority of industrial fluid and structural dynamic models utilize some form of time marching. Thus, if the domain decomposition strategy may be extended to include the temporal dimension, additional opportunity for increased parallelism may be realized. A new form of periodic multiple shooting is proposed that ismatrix-free and may be applied to high-fidelity multiphysics models or other high dimensional systems. The proposed methodology is formulated entirely in the time domain. Therefore, existing time-domain simulation tools may utilize the proposed approach to achieve a high degree of distributed memory parallelism without requiring any reformulation. Furthermore, the proposed methodology may be combined with conventional space domain decomposition techniques and other forms of data parallelism to achieve maximal performance on modern HPC architectures. The proposed algorithm retains the iterative shoot-correct approach of conventational periodic shooting methods. However, the correction stage is formulated using a hierarchical evaluation strategy combined with an Arnoldi subspace approximation to eliminate the need for explicit formulation of Jacobian matricies. The local convergence of the proposed method is formally proven for the case of an asyptotically stable dynamical system. The proposed method is numerically tested for a 2D limit cycle problem, a rigid blade helicoper rotor model with quasi-steady aerodynamics and autopilot trim, and an OVERSET CFD model of a helicopter rotor with prescribed elastic blade motions. The method is observed to be convergent in all test cases and found to exhibit good scalability. The proposed periodic multiple shooting method is a practical means of reducingtime-to-solution for numerical simulations of asymptotically stable periodic systems on distributed memory parallel computers. Furthermore, the proposed method may be used to enhance the parallel scalability of OVERSET CFD models of helicopter rotors in steady periodic flight.
A Time Parallel Approach to Numerical Simulation of Asymptotically Stable Dynamical Systems with Application to CFD Models of Helicopter Rotors
(2023) Silbaugh, Benjamin Scott; Baeder, James; Aerospace Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Modern High Performance Computing (HPC) machines are distributed memoryclusters, consisting of multi-core compute nodes. Engineering simulation and analysis tools must employ efficient parallel algorithms in order to fully utilize the compute capability of modern HPC machines. The trend in Computational Fluid Dynamcis (CFD) has been to construct parallel solution algorithms based on some form of spatial domain decomposition. This approach has been shown to be a success for many practical applications. However, as one attempts to utlize more compute cores, limitations in strong scalability are inevitably reached due to a diminishing compute workload per compute core and either fixed or increasing communication cost. Furthermore, spatial domain decomposition approaches cannot be easily applied to mid-fidelity structural dynamics or rigid body dynamics models. A significant majority of industrial fluid and structural dynamic models utilize some form of time marching. Thus, if the domain decomposition strategy may be extended to include the temporal dimension, additional opportunity for increased parallelism may be realized. A new form of periodic multiple shooting is proposed that ismatrix-free and may be applied to high-fidelity multiphysics models or other high dimensional systems. The proposed methodology is formulated entirely in the time domain. Therefore, existing time-domain simulation tools may utilize the proposed approach to achieve a high degree of distributed memory parallelism without requiring any reformulation. Furthermore, the proposed methodology may be combined with conventional space domain decomposition techniques and other forms of data parallelism to achieve maximal performance on modern HPC architectures. The proposed algorithm retains the iterative shoot-correct approach of conventational periodic shooting methods. However, the correction stage is formulated using a hierarchical evaluation strategy combined with an Arnoldi subspace approximation to eliminate the need for explicit formulation of Jacobian matricies. The local convergence of the proposed method is formally proven for the case of an asyptotically stable dynamical system. The proposed method is numerically tested for a 2D limit cycle problem, a rigid blade helicoper rotor model with quasi-steady aerodynamics and autopilot trim, and an OVERSET CFD model of a helicopter rotor with prescribed elastic blade motions. The method is observed to be convergent in all test cases and found to exhibit good scalability. The proposed periodic multiple shooting method is a practical means of reducingtime-to-solution for numerical simulations of asymptotically stable periodic systems on distributed memory parallel computers. Furthermore, the proposed method may be used to enhance the parallel scalability of OVERSET CFD models of helicopter rotors in steady periodic flight.
A Scalable Time-Parallel Solution of Periodic Dynamics for Three-Dimensional Rotorcraft Aeromechanics
(2022) Patil, Mrinalgouda; Datta, Anubhav; Aerospace Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The principal barrier of computational time for rotorcraft trim solution using high-fidelity three-dimensional (3D) structures on real rotor problems was overcome with parallel and scalable algorithms. These algorithms were devised by leveraging the modern supercomputer architecture. The resulting parallel X3D solver was used to investigate advanced coaxial rotors using a notional hingeless rotor test case, Metaltail. This investigation included rotor performance, blade airloads, vibratory hub loads, and three-dimensional stresses. The technical approach consisted of first studying existing algorithms for periodic rotor dynamics --- time marching, finite element in time (FET), and harmonic balance. The feasibility of these algorithms was studied for large-scale rotor structures, and drawbacks were identified. Modifications were then performed on the harmonic balance method to obtain a Modified Harmonic Balance (MHB) method. A parallel algorithm for skyline solver was devised on shared memory to obtain faster solutions to large linear system of equations. The MHB method was implemented on a hybrid distributed--shared memory architecture to allow for parallel computations of harmonics. These developed algorithms were then integrated into the X3D solver to obtain a new parallel X3D. The new parallel X3D was verified and validated in hover and forward flight conditions for both idealized and real rotor test cases. A total of four test cases were studied: 1) uniform beam, 2) Frank Harris rotor, 3) UH-60A-like Black Hawk rotor, and 4) NASA Tilt Rotor Aeroacoustic Model (TRAM). The predictions of tip displacements, airloads, and stress distributions from the MHB algorithm showed good agreement with the test data and time marching predictions. The key conclusion is that the new solver converges to the time marching solution 50-70 times faster and achieves a performance greater than 1 teraFLOPS. The new parallel X3D solver opened the opportunity for modeling advanced rotor configurations. In this work, the coaxial rotor was the selected configuration. Two open access models were developed; 1) a notional hingeless coaxial rotor, and 2) a notional articulated UH-60A-like coaxial rotor. The aerodynamics, structural dynamics, and trim modules of X3D were expanded for coaxial modeling. The coaxial aerodynamics was validated with hover performance data from the U.S. Army model test. The coaxial solver was then used to study rotor aeromechanics in forward flight. The analysis was performed at a low-speed transition flight for which qualitative data is available for the Sikorsky S-97 Raider aircraft for comparison. The UH-60A coaxial airloads showed good agreement with the S-97 data as the twists are likely similar. However, the Metaltail model showed dissimilarities, and the cause was investigated to be its high twist. Vibratory hub loads with advance ratio were studied, and the maximum vibration occurred at the transition flight speed ($\mu = 0.1 - 0.15$), which was consistent with the S-97 data. The effect of the inter-rotor phase was examined for the reduction of vibratory hub loads. Three-dimensional stresses and strains were predicted and visualized for the first time on lift offset coaxial rotors in the blade and the hub.
Data-centric Performance Measurement and Mapping for Highly Parallel Programming Models
(2018) Zhang, Hui; Hollingsworth, Jeffrey K.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Modern supercomputers have complex features: many hardware threads, deep memory hierarchies, and many co-processors/accelerators. Productively and effectively designing programs to utilize those hardware features is crucial in gaining the best performance. There are several highly parallel programming models in active development that allow programmers to write efficient code on those architectures. Performance profiling is a very important technique in the development to achieve the best performance. In this dissertation, I proposed a new performance measurement and mapping technique that can associate performance data with program variables instead of code blocks. To validate the applicability of my data-centric profiling idea, I designed and implemented a profiler for PGAS and CUDA. For PGAS, I developed ChplBlamer, for both single-node and multi-node Chapel programs. My tool also provides new features such as data-centric inter-node load imbalance identification. For CUDA, I developed CUDABlamer for GPU-accelerated applications. CUDABlamer also attributes performance data to program variables, which is a feature that was not found in any previous CUDA profilers. Directed by the insights from the tools, I optimized several widely-studied benchmarks and significantly improved program performance by a factor of up to 4x for Chapel and 47x for CUDA kernels.

Theses and Dissertations from UMD

Browse

Filters

Settings

Sort By

Results per page

Search Results