Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 15
  • Thumbnail Image
    Item
    Theory, Design, and Implementation of Landmark Promotion Cooperative Simultaneous Localization and Mapping
    (2011) Karvounis, John George; Blankenship, Gilmer; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Simultaneous Localization and Mapping (SLAM) is a challenging problem in practice, the use of multiple robots and inexpensive sensors poses even more demands on the designer. Cooperative SLAM poses specific challenges in the areas of computational efficiency, software/network performance, and robustness to errors. New methods in image processing, recursive filtering, and SLAM have been developed to implement practical algorithms for cooperative SLAM on a set of inexpensive robots. The Consolidated Unscented Mixed Recursive Filter (CUMRF) is designed to handle non-linear systems with non-Gaussian noise. This is accomplished using the Unscented Transform combined with Gaussian Mixture Models. The Robust Kalman Filter is an extension of the Kalman Filter algorithm that improves the ability to remove erroneous observations using Principal Component Analysis (PCA) and the X84 outlier rejection rule. Forgetful SLAM is a local SLAM technique that runs in nearly constant time relative to the number of visible landmarks and improves poor performing sensors through sensor fusion and outlier rejection. Forgetful SLAM correlates all measured observations, but stops the state from growing over time. Hierarchical Active Ripple SLAM (HAR-SLAM) is a new SLAM architecture that breaks the traditional state space of SLAM into a chain of smaller state spaces, allowing multiple robots, multiple sensors, and multiple updates to occur in linear time with linear storage with respect to the number of robots, landmarks, and robots poses. This dissertation presents explicit methods for closing-the-loop, joining multiple robots, and active updates. Landmark Promotion SLAM is a hierarchy of new SLAM methods, using the Robust Kalman Filter, Forgetful SLAM, and HAR-SLAM. Practical aspects of SLAM are a focus of this dissertation. LK-SURF is a new image processing technique that combines Lucas-Kanade feature tracking with Speeded-Up Robust Features to perform spatial and temporal tracking. Typical stereo correspondence techniques fail at providing descriptors for features, or fail at temporal tracking. Several calibration and modeling techniques are also covered, including calibrating stereo cameras, aligning stereo cameras to an inertial system, and making neural net system models. These methods are important to improve the quality of the data and images acquired for the SLAM process.
  • Thumbnail Image
    Item
    An Integrated Program Representation for Loop Optimizations
    (2011) Yellareddy, Greeshma; Barua, Rajeev; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Inspite of all the advances, automatic parallelization has not entered the general purpose compiling environment for several reasons. There have been two distinct schools of thought in parallelization domain namely, affine and non-affine which have remained incompatible with each other over the years. Thus, a good practical compiler will have to be able to analyze and parallelize any type of code - affine or non-affine or a mix of both. To be able to achieve the best performance, compilers will have to derive the order of transformations best suitable for a given program on a given system. This problem, known as "Phase Ordering", is a very crucial impedance for practical compilers, more so for parallelizing compilers. The ideal compiler should be able to consider various orders of transformations and reason about the performance benefits of the same. In order to achieve such a compiler, in this paper, we propose a unified program representation which has the following characteristics: a) Modular in nature. b) Ability to represent both ane and non-ane transformations. c) Ability to use detailed static run-time estimators directly on the representation.
  • Thumbnail Image
    Item
    Runtime Enforcement of Memory Safety for the C Programming Language
    (2011) Simpson, Matthew Stephen; Barua, Rajeev; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Memory access violations are a leading source of unreliability in C programs. Although the low-level features of the C programming language, like unchecked pointer arithmetic and explicit memory management, make it a desirable language for many programming tasks, their use often results in hard-to-detect memory errors. As evidence of this problem, a variety of methods exist for retrofitting C with software checks to detect memory errors at runtime. However, these techniques generally suffer from one or more practical drawbacks that have thus far limited their adoption. These weaknesses include the inability to detect all spatial and temporal violations, the use of incompatible metadata, the need for manual code modifications, and the tremendous runtime cost of providing complete safety. This dissertation introduces MemSafe, a compiler analysis and transformation for ensuring the memory safety of C programs at runtime while avoiding the above drawbacks. MemSafe makes several novel contributions that improve upon previous work and lower the runtime cost of achieving memory safety. These include (1) a method for modeling temporal errors as spatial errors, (2) a hybrid metadata representation that combines the most salient features of both object- and pointer-based approaches, and (3) a data-flow representation that simplifies optimizations for removing unneeded checks and unused metadata. Experimental results indicate that MemSafe is capable of detecting memory safety violations in real-world programs with lower runtime overhead than previous methods. Results show that MemSafe detects all known memory errors in multiple versions of two large and widely-used open source applications as well as six programs from a benchmark suite specifically designed for the evaluation of error detection tools. MemSafe enforces complete safety with an average overhead of 88% on 30 widely-used performance evaluation benchmarks. In comparison with previous work, MemSafe's average runtime overhead for one common benchmark suite (29%) is a fraction of that associated with the previous technique (133%) that, until now, had the lowest overhead among all existing complete and automatic methods that are capable of detecting both spatial and temporal violations.
  • Thumbnail Image
    Item
    Parallel Computation of Nonrigid Image Registration
    (2011) Leung, Frances Kimpik; Shekhar, Raj; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Automatic intensity-based nonrigid image registration brings significant impact in medical applications such as multimodality fusion of images, serial comparison for monitoring disease progression or regression, and minimally invasive image-guided interventions. However, due to memory and compute intensive nature of the operations, intensity-based image registration has remained too slow to be practical for clinical adoption, with its use limited primarily to as a pre-operative too. Efficient registration methods can lead to new possibilities for development of improved and interactive intraoperative tools and capabilities. In this thesis, we propose an efficient parallel implementation for intensity-based three-dimensional nonrigid image registration on a commodity graphics processing unit. Optimization techniques are developed to accelerate the compute-intensive mutual information computation. The study is performed on the hierarchical volume subdivision-based algorithm, which is inherently faster than other nonrigid registration algorithms and structurally well-suited for data-parallel computation platforms. The proposed implementation achieves more than 50-fold runtime improvement over a standard implementation on a CPU. The execution time of nonrigid image registration is reduced from hours to minutes while retaining the same level of registration accuracy.
  • Thumbnail Image
    Item
    High-Performance DRAM System Design Constraints and Considerations
    (2010) Gross, Joseph; Jacob, Bruce L; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The effects of a realistic memory system have not received much attention in recent decades. Often, the memory controller and DRAMs are modeled as a fixed-latency or random-latency system, which leads to simulations that are less accurate. As more cores are added to each die and CPU clock rates continue to outpace memory access times, the gap will only grow wider and simulation results will be less accurate. This thesis proposes to look at the way a memory controller and DRAM system work and attempt to model them accurately in a simulator. It will use a simulated Alpha 21264 processor in conjunction with a full system simulator and memory system simulator. Various SPEC06 benchmarks are used to look at runtimes. The process of mapping a memory location to a physical location, the algorithm for choosing the ordering of commands to be sent to the DRAMs and the method of managing the row buffers are examined in detail. We find that the choice in these algorithms and policies can affect application runtime by up to 200% or more. It is also shown that energy use can vary by up to 300% by changing changing the address mapping policy. These results show that it is important to look at all the available policies to optimize the memory system for the type of workload that a machine will be running. No single policy is best for every application, so it is important to understand the interaction of the application and the memory system to improve performance and reduce the energy consumed.
  • Thumbnail Image
    Item
    CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS
    (2010) Yu, Chenjie; Petrov, Peter; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches in recent years. Such architectures provide potentials for sustainable performance scaling into future peta-scale/exa-scale computing platforms, at affordable power budget, design complexity, and verification efforts. To date, multi-core processor products have been replacing uni-core processors in almost every market segment, including embedded systems, general-purpose desktops and laptops, and super computers. However, many issues still remain with multi-core processor architectures that need to be addressed before their potentials could be fully realized. People in both academia and industry research community are still seeking proper ways to make efficient and effective use of these processors. The issues involve hardware architecture trade-offs, the system software service, the run-time management, and user application design, which demand more research effort into this field. Due to the architectural specialties with multi-core based computers, a Cross-Layer Customization framework is proposed in this work, which combines application specific information and system platform features, along with necessary operating system service support, to achieve exceptional power and performance efficiency for targeted multi-core platforms. Several topics are covered with specific optimization goals, including snoop cache coherence protocol, inter-core communication for producer-consumer applications, synchronization mechanisms, and off-chip memory bandwidth limitations. Analysis of benchmark program execution with conventional mechanisms is made to reveal the overheads in terms of power and performance. Specific customizations are proposed to eliminate such overheads with support from hardware, system software, compiler, and user applications. Experiments show significant improvement on system performance and power efficiency.
  • Thumbnail Image
    Item
    Cellular Pattern Quantication and Automatic Bench-marking Data-set Generation on confocal microscopy images
    (2010) Cui, Chi; JaJa, Joseph; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The distribution, directionality and motility of the actin fibers control cell shape, affect cell function and are different in cancer versus normal cells. Quantification of actin structural changes is important for further understanding differences between cell types and for elucidation the effects and dynamics of drug interactions. We propose an image analysis framework to quantify the F-actin organization patterns in response to different pharmaceutical treatments.The main problems addressed include which features to quantify and what quantification measurements to compute when dealing with unlabeled confocal microscopy images. The resultant numerical features are very effective to profile the functional mechanism and facilitate the comparison of different drugs. The analysis software is originally implemented in Matlab and more recently the most time consuming part in the feature extraction stage is implemented onto the NVIDIA GPU using CUDA where we obtain 15 to 20 speedups for different sizes of image. We also propose a computational framework for generating synthetic images for validation purposes. The validation for the feature extraction is done by visual inspection and the validation for quantification is done by comparing them with well-known biological facts. Future studies will further validate the algorithms, and elucidate the molecular pathways and kinetics underlying the F-actin changes. This is the first study quantifying different structural formations of the same protein in intact cells. Since many anti-cancer drugs target the cytoskeleton, we believe that the quantitative image analysis method reported here will have broad applications to understanding the mechanisms of candidate pharmaceutical.
  • Thumbnail Image
    Item
    Long-term Information Preservation and Access
    (2010) Song, Sang Chul; JaJa, Joseph F; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    An unprecedented amount of information encompassing almost every facet of human activities across the world is generated daily in the form of zeros and ones, and that is often the only form in which such information is recorded. A good fraction of this information needs to be preserved for periods of time ranging from a few years to centuries. Consequently, the problem of preserving digital information over a long-term has attracted the attention of many organizations, including libraries, government agencies, scientific communities, and individual researchers. In this dissertation, we address three issues that are critical to ensure long-term information preservation and access. The first concerns the core requirement of how to guarantee the integrity of preserved contents. Digital information is in general very fragile because of the many ways errors can be introduced, such as errors introduced because of hardware and media degradation, hardware and software malfunction, operational errors, security breaches, and malicious alterations. To address this problem, we develop a new approach based on efficient and rigorous cryptographic techniques, which will guarantee the integrity of preserved contents with extremely high probability even in the presence of malicious attacks. Our prototype implementation of this approach has been deployed and actively used in the past years in several organizations, including the San Diego Super Computer Center, the Chronopolis Consortium, North Carolina State University, and more recently the Government Printing Office. Second, we consider another crucial component in any preservation system - searching and locating information. The ever-growing size of a long-term archive and the temporality of each preserved item introduce a new set of challenges to providing a fast retrieval of content based on a temporal query. The widely-used cataloguing scheme has serious scalability problems. The standard full-text search approach has serious limitations since it does not deal appropriately with the temporal dimension, and, in particular, is incapable of performing relevancy scoring according to the temporal context. To address these problems, we introduce two types of indexing schemes - a location indexing scheme, and a full-text search indexing scheme. Our location indexing scheme provides optimal operations for inserting and locating a specific version of a preserved item given an item ID and a time point, and our full-text search indexing scheme efficiently handles the scalability problem, supporting relevancy scoring within the temporal context at the same time. Finally, we address the problem of organizing inter-related data, so that future accesses and data exploration can be quickly performed. We, in particular, consider web contents, where we combine a link-analysis scheme with a graph partitioning scheme to put together more closely related contents in the same standard web archive container. We conduct experiments that simulate random browsing of preserved contents, and show that our data organization scheme greatly minimizes the number of containers needed to be accessed for a random browsing session. Our schemes have been tested against real-world data of significant scale, and validated through extensive empirical evaluations.
  • Thumbnail Image
    Item
    MODELING AND OPTIMIZATION TECHNIQUES FOR EFFICIENT IMPLEMENTATION OF PARALLEL EMBEDDED SYSTEMS
    (2010) GU, RUIRUI; Bhattacharyya, Shuvra S; Levine, William S; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Embedded systems are becoming more and more important. The products containing embedded systems span from day-to-day household and consumer products, such as digital TVs, mobile phones, and automobiles, to industrial devices and equipment, including, for example, robots, aviation equipment, and high end military and scientific devices such as aircraft. Previously, because embedded systems were highly limited in computational capability, memory size, and power consumption, much research was dedicated to making the best use of limited system resources. In these works, system performance issues, such as execution time, were traded off with system resources, and resources were carefully scheduled and utilized. With more available computational capability in embedded system devices, and more complicated requirements demanding more intensive computation, the most critical design concerns are changing in some important application domains. In such application areas, researchers are paying more and more attention to improving system execution time, which is also the core topic of our work. Execution time is especially critical to real time systems, in the sense that it is related not only to system performance, but also to system correctness and reliability. Multi-core devices, which incorporate two or more processors on the same integrated circuits, are becoming increasingly relevant to the design and implementation of embedded systems. In multi-core platforms, carefully managing communication and synchronization among different cores is important to achieve efficient implementations. Two or more processing cores sharing the same system bus and memory bandwidth limit the achievable performance improvements. The ability of multi-core processors to increase application performance depends on the use of multiple concurrent tasks within applications. Therefore, if code is written in a form that facilitates decomposition into concurrent tasks, the multi-core technologies can be exploited more effectively. Dataflow-based languages are suitable for such decomposition into concurrent tasks, particularly in the broad domain of digital signal processing (DSP) applications. Dataflow representations of DSP software have been explored actively since the 1980s. Such representations have proved to be useful in identifying bottlenecks in DSP algorithms, improving the efficiency of the computations, and designing appropriate hardware for implementing the algorithms. Dataflow descriptions have been used in a wide range of DSP application areas, such as multimedia processing, and wireless communications. Among various forms of dataflow modeling, synchronous dataflow (SDF) is geared towards static scheduling of computational modules, which improves system performance and predictability. However, many DSP applications do not fully conform to the restrictions of SDF modeling. More general dataflow models, such as CAL, have been developed to describe dynamically-structured DSP applications. Such generalized models can express dynamically changing functionality, but lose the powerful static scheduling capabilities provided by SDF. This thesis explores modeling and optimization techniques for efficient implementation of parallel embedded systems. We propose a dataflow based framework, which covers modeling, analysis and optimization and bridges between user-friendly design and efficient implementation. The framework is applied to two kinds of applications: control systems and video processing systems. Model Predictive Control (MPC) has been used in a wide range of application areas including chemical engineering, food processing, automotive engineering, aerospace, and metallurgy. An important limitation on the application of MPC is the difficulty in completing the necessary computations within the sampling interval. Recent trends in computing hardware towards greatly increased parallelism offer a solution to this problem. Our work describes modeling and analysis tools to facilitate implementing MPC algorithms on parallel computers, thereby greatly reducing the time needed to complete the calculations. The use of these tools is illustrated by an application to the critical components of an important class of MPC problems, including the Newton-KKT algorithm, the active set method and linear system solvers. This thesis also presents an in-depth case study of dataflow-based analysis and exploitation of parallelism in the design and implementation of an MPEG RVC (reconfigurable video coding) decoder. Because dataflow models are effective in exposing concurrency and other important forms of high level application structure, dataflow techniques are promising for implementing complex DSP applications on multi-core systems, and other kinds of parallel processing platforms. Targeting video processing systems, we use the CAL language as a concrete framework for representing and demonstrating dataflow design techniques. Furthermore, we also analyze our application of the DIF package (TDP), which helps to automatically process regions that are extracted from the original network, and exhibit properties similar to synchronous dataflow (SDF) models. Detection of SDF-like regions is an important step for applying static scheduling techniques within a dynamic dataflow framework. Furthermore, segmenting a system into SDF-like regions also allows us to explore cross-actor concurrency that results from dynamic dependencies among different regions. Using SDF-like region detection as a preprocessing step to software synthesis generally provides an efficient way for mapping tasks to multi-core systems, and improves the system performance of video processing applications on multi-core platforms. Finally the automation from system design to efficient implementation helps our dataflow based modeling and optimization techniques extend into a wide range of embedded applications.
  • Thumbnail Image
    Item
    An Architecture for High-throughput and Improved-quality Stereo Vision Processor
    (2010) Han, Sang-Kyo; Qu, Gang; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This paper presents the VLSI architecture to achieve high-throughput and improved-quality stereo vision for real applications. The stereo vision processor generates gray-scale output images with depth information from input images taken by two CMOS Image Sensors (CIS). The depth estimator using the sum of absolute differences (SAD) algorithm as stereo matching technique is implemented on hardware by exploiting pipelining and parallelism. To produce depth maps with improved-quality at real-time, pre- and post-processing units are adopted, and to enhance the adaptability of the system to real environments, special function registers (SFRs) are assigned to vision parameters. The design using 0.18um standard CMOS technology can operate at 120MHz clock, achieving over 140 frames/sec depth maps with 320 by 240 image size and 64 disparity levels. Experimental results based on images taken in real world and the Middlebury data set will be presented. Comparison data with existing hardware systems and hardware specifications of the proposed processor will be given.