A. James Clark School of Engineering

Permanent URI for this communityhttp://hdl.handle.net/1903/1654

The collections in this community comprise faculty research works, as well as graduate theses and dissertations.

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    A Scalable Time-Parallel Solution of Periodic Dynamics for Three-Dimensional Rotorcraft Aeromechanics
    (2022) Patil, Mrinalgouda; Datta, Anubhav; Aerospace Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The principal barrier of computational time for rotorcraft trim solution using high-fidelity three-dimensional (3D) structures on real rotor problems was overcome with parallel and scalable algorithms. These algorithms were devised by leveraging the modern supercomputer architecture. The resulting parallel X3D solver was used to investigate advanced coaxial rotors using a notional hingeless rotor test case, Metaltail. This investigation included rotor performance, blade airloads, vibratory hub loads, and three-dimensional stresses. The technical approach consisted of first studying existing algorithms for periodic rotor dynamics --- time marching, finite element in time (FET), and harmonic balance. The feasibility of these algorithms was studied for large-scale rotor structures, and drawbacks were identified. Modifications were then performed on the harmonic balance method to obtain a Modified Harmonic Balance (MHB) method. A parallel algorithm for skyline solver was devised on shared memory to obtain faster solutions to large linear system of equations. The MHB method was implemented on a hybrid distributed--shared memory architecture to allow for parallel computations of harmonics. These developed algorithms were then integrated into the X3D solver to obtain a new parallel X3D. The new parallel X3D was verified and validated in hover and forward flight conditions for both idealized and real rotor test cases. A total of four test cases were studied: 1) uniform beam, 2) Frank Harris rotor, 3) UH-60A-like Black Hawk rotor, and 4) NASA Tilt Rotor Aeroacoustic Model (TRAM). The predictions of tip displacements, airloads, and stress distributions from the MHB algorithm showed good agreement with the test data and time marching predictions. The key conclusion is that the new solver converges to the time marching solution 50-70 times faster and achieves a performance greater than 1 teraFLOPS. The new parallel X3D solver opened the opportunity for modeling advanced rotor configurations. In this work, the coaxial rotor was the selected configuration. Two open access models were developed; 1) a notional hingeless coaxial rotor, and 2) a notional articulated UH-60A-like coaxial rotor. The aerodynamics, structural dynamics, and trim modules of X3D were expanded for coaxial modeling. The coaxial aerodynamics was validated with hover performance data from the U.S. Army model test. The coaxial solver was then used to study rotor aeromechanics in forward flight. The analysis was performed at a low-speed transition flight for which qualitative data is available for the Sikorsky S-97 Raider aircraft for comparison. The UH-60A coaxial airloads showed good agreement with the S-97 data as the twists are likely similar. However, the Metaltail model showed dissimilarities, and the cause was investigated to be its high twist. Vibratory hub loads with advance ratio were studied, and the maximum vibration occurred at the transition flight speed ($\mu = 0.1 - 0.15$), which was consistent with the S-97 data. The effect of the inter-rotor phase was examined for the reduction of vibratory hub loads. Three-dimensional stresses and strains were predicted and visualized for the first time on lift offset coaxial rotors in the blade and the hub.
  • Thumbnail Image
    Item
    Easy PRAM-based High-performance Parallel Programming with ICE
    (2016-08-31) Ghanim, Fady; Barua, Rajeev; Vishkin, Uzi
    Parallel machines have become more widely used. Unfortunately parallel programming technologies have advanced at a much slower pace except for regular programs. For irregular programs, this advancement is inhibited by high synchronization costs, non-loop parallelism, non-array data structures, recursively expressed parallelism and parallelism that is too fine-grained to be exploitable. We present ICE, a new parallel programming language that is easy-to-program, since: (i) ICE is a synchronous, lock-step language; (ii) for a PRAM algorithm its ICE program amounts to directly transcribing it; and (iii) the PRAM algorithmic theory offers unique wealth of parallel algorithms and techniques. We propose ICE to be a part of an ecosystem consisting of the XMT architecture, the PRAM algorithmic model, and ICE itself, that together deliver on the twin goal of easy programming and efficient parallelization of irregular programs. The XMT architecture, developed at UMD, can exploit fine-grained parallelism in irregular programs. We built the ICE compiler which translates the ICE language into the multithreaded XMTC language; the significance of this is that multi-threading is a feature shared by practically all current scalable parallel programming languages. As one indication of ease of programming, we observed a reduction in code size in 7 out of 11 benchmarks vs. XMTC. For these programs, the average reduction in number of lines of code was when compared to hand optimized XMTC The remaining 4 benchmarks had the same code size. Our main result is perhaps surprising: The run-time was comparable to XMTC with a 0.76% average gain for ICE across all benchmarks.
  • Thumbnail Image
    Item
    High-Performance Computing Algorithms for Constructing Inverted Files on Emerging Multicore Processors
    (2012) Wei, Zheng; JaJa, Joseph F; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Current trends in processor architectures increasingly include more cores on a single chip and more complex memory hierarchies, and such a trend is likely to continue in the foreseeable future. These processors offer unprecedented opportunities for speeding up demanding computations if the available resources can be effectively utilized. Simultaneously, parallel programming languages such as OpenMP and MPI have been commonly used on clusters of multicore CPUs while newer programming languages such as OpenCL and CUDA have been widely adopted on recent heterogeneous systems and GPUs respectively. The main goal of this dissertation is to develop techniques and methodologies for exploiting these emerging parallel architectures and parallel programming languages to solve large scale irregular applications such as the construction of inverted files. The extraction of inverted files from large collections of documents forms a critical component of all information retrieval systems including web search engines. In this problem, the disk I/O throughput is the major performance bottleneck especially when intermediate results are written onto disks. In addition to the I/O bottleneck, a number of synchronization and consistency issues must be resolved in order to build the dictionary and postings lists efficiently. To address these issues, we introduce a dictionary data structure using a hybrid of trie and B-trees and a high-throughput pipeline strategy that completely avoids the use of disks as temporary storage for intermediate results, while ensuring the consumption of the input data at a high rate. The high-throughput pipelined strategy produces parallel parsed streams that are consumed at the same rate by parallel indexers. The pipelined strategy is implemented on a single multicore CPU as well as on a cluster of such nodes. We were able to achieve a throughput of more than 262MB/s on the ClueWeb09 dataset on a single node. On a cluster of 32 nodes, our experimental results show scalable performance using different metrics, significantly improving on prior published results. On the other hand, we develop a new approach for handling time-evolving documents using additional small temporal indexing structures. The lifetime of the collection is partitioned into multiple time windows, which guarantees a very fast temporal query response time at a small space overhead relative to the non-temporal case. Extensive experimental results indicate that the overhead in both indexing and querying is small in this more complicated case, and the query performance can indeed be improved using finer temporal partitioning of the collection. Finally, we employ GPUs to accelerate the indexing process for building inverted files and to develop a very fast algorithm for the highly irregular list ranking problem. For the indexing problem, the workload is split between CPUs and GPUs in such a way that the strengths of both architectures are exploited. For the list ranking problem involved in the decompression of inverted files, an optimized GPU algorithm is introduced by reducing the problem to a large number of fine grain computations in such a way that the processing cost per element is shown to be close to the best possible.