Electrical & Computer Engineering Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2765

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    Model-Based Design for High-Performance Signal Processing Applications
    (2019) Wu, Jiahao; Bhattacharyya, Shuvra S.; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Developing high-performance signal processing applications requires not only effective signal processing algorithms but also efficient software design methods that can take full advantage of the available processing resources. An increasingly important type of hardware platform for high-performance signal processing is a multicore central processing unit (CPU) combined with a graphics processing unit (GPU) accelerator. Efficiently coordinating computations on both the host (CPU) and device (GPU), and managing host-device data transfers are critical to utilizing CPU-GPU platforms effectively. However, such coordination is challenging for system designers, given the complexity of modern signal processing applications and the stringent constraints under which they must operate. Dataflow models of computation provide a useful framework for addressing this challenge. In such a modeling approach, signal processing applications are represented as directed graphs that can be viewed intuitively as high-level signal flow diagrams. The formal, high-level abstraction provided by dataflow principles provides a useful foundation to investigate model-based analysis and optimization for new challenges in design and implementation of signal processing systems. This thesis presents a new model-based design methodology and an evolution of three novel design tools. These contributions provide an automated design flow for high performance signal processing. The design flow takes high-level dataflow representations as input and systematically derives optimized implementations on CPU-GPU platforms. The proposed design flow and associated design methodology are inspired by a previously-developed application programming interface (API) called the Hybrid Task Graph Scheduler (HTGS). HTGS was developed for implementing scalable workflows for high-performance computing applications on compute nodes that have large numbers of processing cores, and that may be equipped with multiple GPUs. However, HTGS has a limitation due to its relatively loose use of dataflow techniques (or other forms of model-based design), which results in a significant designer effort being required to apply the provided APIs effectively. The main contributions of the thesis are summarized as follows: (1) Development of a companion tool to HTGS that is called the HTGS Model-based Engine (HMBE). HMBE introduces novel capabilities to automatically analyze application dataflow graphs and generate efficient schedules for these graphs through hybrid compile-time and runtime analysis. The systematic, model-based approaches provided by HMBE enable the automation of complex tasks that must be performed manually when using HTGS alone. We have demonstrated the effectiveness of HMBE and the associated model-based design methodology through extensive experiments involving two case studies: an image stitching application for large scale microscopy images, and a background subtraction application for multispectral video streams. (2) Integration of HMBE with HTGS to develop a new design tool for the design and implementation of high-performance signal processing systems. This tool, called HMBE-Integrated-HTGS (HI-HTGS), provides novel capabilities for model-based system design, memory management, and scheduling targeted to multicore platforms. HMBE takes as input a single- or multi-dimensional dataflow model of the given signal processing application. The tool then expands the dataflow model into an expanded representation that exposes more parallelism and provides significantly more detail on the interactions between different application tasks (dataflow actors). This expanded representation is derived by HI-HTGS at compile-time and provided as input to the HI-HTGS runtime system. The runtime system in turn applies the expanded representation to guide dynamic scheduling decisions throughout system execution. (3) Extension of HMBE to the class of CPU-GPU platforms motivated above. We call this new model-based design tool the CPU-GPU Model-Based Engine (CGMBE). CGMBE uses an unfolded dataflow graph representation of the application along with thread-pool-based executors, which are optimized for efficient operation on the targeted CPU-GPU platform. This approach automates complex aspects of the design and implementation process for signal processing system designers while maximizing the utilization of computational power, reducing the memory footprint for both the CPU and GPU, and facilitating experimentation for tuning performance-oriented designs.
  • Item
    Dynamic Resource Allocation in Wireless Heterogeneous Networks
    (2015) Singh, Vaibhav; Shayman, Prof. Mark; La, Prof. Richard; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Deployment of low power basestations within cellular networks can potentially increase both capacity and coverage. However, such deployments require efficient resource allocation schemes for managing interference from the low power and macro basestations that are located within each other’s transmission range. In this dissertation, we propose novel and efficient dynamic resource allocation algorithms in the frequency, time and space domains. We show that the proposed algorithms perform better than the current state-of-art resource management algorithms. In the first part of the dissertation, we propose an interference management solution in the frequency domain. We introduce a distributed frequency allocation scheme that shares frequencies between macro and low power pico basestations, and guarantees a minimum average throughput to users. The scheme seeks to minimize the total number of frequencies needed to honor the minimum throughput requirements. We evaluate our scheme using detailed simulations and show that it performs on par with the centralized optimum allocation. Moreover, our proposed scheme outperforms a static frequency reuse scheme and the centralized optimal partitioning between the macro and picos. In the second part of the dissertation, we propose a time domain solution to the interference problem. We consider the problem of maximizing the alpha-fairness utility over heterogeneous wireless networks (HetNets) by jointly optimizing user association, wherein each user is associated to any one transmission point (TP) in the network, and activation fractions of all TPs. Activation fraction of a TP is the fraction of the frame duration for which it is active, and together these fractions influence the interference seen in the network. To address this joint optimization problem which we show is NP-hard, we propose an alternating optimization based approach wherein the activation fractions and the user association are optimized in an alternating manner. The subproblem of determining the optimal activation fractions is solved using a provably convergent auxiliary function method. On the other hand, the subproblem of determining the user association is solved via a simple combinatorial algorithm. Meaningful performance guarantees are derived in either case. Simulation results over a practical HetNet topology reveal the superior performance of the proposed algorithms and underscore the significant benefits of the joint optimization. In the final part of the dissertation, we propose a space domain solution to the interference problem. We consider the problem of maximizing system utility by optimizing over the set of user and TP pairs in each subframe, where each user can be served by multiple TPs. To address this optimization problem which is NP-hard, we propose a solution scheme based on difference of submodular function optimization approach. We evaluate our scheme using detailed simulations and show that it performs on par with a much more computationally demanding difference of convex function optimization scheme. Moreover, the proposed scheme performs within a reasonable percentage of the optimal solution. We further demonstrate the advantage of the proposed scheme by studying its performance with variation in different network topology parameters.
  • Item
    MULTI-SCALE SCHEDULING TECHNIQUES FOR SIGNAL PROCESSING SYSTEMS
    (2013) Zhou, Zheng; Bhattacharyya, Shuvra S; Qu, Gang; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    A variety of hardware platforms for signal processing has emerged, from distributed systems such as Wireless Sensor Networks (WSNs) to parallel systems such as Multicore Programmable Digital Signal Processors (PDSPs), Multicore General Purpose Processors (GPPs), and Graphics Processing Units (GPUs) to heterogeneous combinations of parallel and distributed devices. When a signal processing application is implemented on one of those platforms, the performance critically depends on the scheduling techniques, which in general allocate computation and communication resources for competing processing tasks in the application to optimize performance metrics such as power consumption, throughput, latency, and accuracy. Signal processing systems implemented on such platforms typically involve multiple levels of processing and communication hierarchy, such as network-level, chip-level, and processor-level in a structural context, and application-level, subsystem-level, component-level, and operation- or instruction-level in a behavioral context. In this thesis, we target scheduling issues that carefully address and integrate scheduling considerations at different levels of these structural and behavioral hierarchies. The core contributions of the thesis include the following. Considering both the network-level and chip-level, we have proposed an adaptive scheduling algorithm for wireless sensor networks (WSNs) designed for event detection. Our algorithm exploits discrepancies among the detection accuracy of individual sensors, which are derived from a collaborative training process, to allow each sensor to operate in a more energy efficient manner while the network satisfies given constraints on overall detection accuracy. Considering the chip-level and processor-level, we incorporated both temperature and process variations to develop new scheduling methods for throughput maximization on multicore processors. In particular, we studied how to process a large number of threads with high speed and without violating a given maximum temperature constraint. We targeted our methods to multicore processors in which the cores may operate at different frequencies and different levels of leakage. We develop speed selection and thread assignment schedulers based on the notion of a core's steady state temperature. Considering the application-level, component-level and operation-level, we developed a new dataflow based design flow within the targeted dataflow interchange format (TDIF) design tool. Our new multiprocessor system-on-chip (MPSoC)-oriented design flow, called TDIF-PPG, is geared towards analysis and mapping of embedded DSP applications on MPSoCs. An important feature of TDIF-PPG is its capability to integrate graph level parallelism and actor level parallelism into the application mapping process. Here, graph level parallelism is exposed by the dataflow graph application representation in TDIF, and actor level parallelism is modeled by a novel model for multiprocessor dataflow graph implementation that we call the Parallel Processing Group (PPG) model. Building on the contribution above, we formulated a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for chip-level MPSoC mapping of DSP systems that are represented as synchronous dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows --- particle swarm optimization with a mixed integer programming formulation, particle swarm optimization with a simulated annealing engine, and particle swarm optimization with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support general PAS problem which considers the actors cannot be parallelized.
  • Item
    Resource Allocation Schemes for OFDMA Based Wireless Systems with Quality of Service Constraints
    (2007-10-08) Girici, Tolga; Ephremides, Anthony; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With its capabilities like elimination of intersymbol interference, intercell interference averaging, scalability and high bandwidth efficiency OFDMA is becoming the basis for current wireless communication technologies. In this dissertation we study the problem of multiple access and resource allocation for OFDMA-based cellular systems that support users with various quality of service (QoS) requirements. In Chapters 2 and 3 of the dissertation, we consider the problem of downlink transmission (from base station to users) for proportional fairness of long term averaged received rates of data users as well as QoS for voice and video sessions. Delay requirements of real time sessions are converted into rate requirements at each frame. The base station allocates available power and bandwidth to individual users based onreceived rates, rate constraints and channel conditions. We formulate and solve the underlying constrained optimization problem and propose an algorithm that achieves the optimal allocation. In Chapter 3, we obtain a resource allocation scheme that is simpler but achieves a performance comparable to the optimal algorithm proposed in Chapter 2. The algorithms that we propose are especially intended for broadband networks supporting mobile users as the subchannelization scheme we assume averages out the fading in subchannels and performs better under fast fading environment. This also leads to algorithms that are simpler than the ones available in the literature. In Chapter 4 of the dissertation we include relay stations to the previousmodel. The use of low-cost relay stations in OFDM based broadband networks receives increasing attention as they help to improve the cell coverage. For a network supporting heterogeneous traffic we study TDMA based subframe allocation for base and relay stations as well as joint power/bandwidth allocation for individual sessions. We propose an algorithm again using the constrained optimization framework. Our numerical results prove that our multihop relay scheme indeed improves the network coverage and satisfy QoS requirements of user at the cell edge. In the last Chapter, we deviate from the previous chapters and consider an OFDMA based system where the subchannels experience frequency selective fading. We investigate a standard subchannel allocation scheme that exploits multiuser diversity by allocating each subchannel to the user with maximum normalized SNR. Using extreme value theory and generating function approach we did a queueing analysis for this system and estimated the QoS violations through finding the tail distribution of the queue sizes of users. Simulation results show that our estimates are quite accurate and they can be used in admission control and rate control to improve the resource utilization in the system.