Design Techniques for Embedded Computer Vision and Signal Processing

Thumbnail Image


Lee_umd_0117E_23703.pdf (51.28 MB)
No. of downloads:

Publication or External Link





In this thesis, we explore new design techniques to facilitate the implementation of efficient deep learning systems for embedded computer vision and signal processing. The techniques are developed to address concerns of real-time processing efficiency and energy efficiency under resource-constrained operation as well accuracy considerations, which are conventionally associated with the development of deep learning solutions. We study two specific application areas for efficient deep learning — (1) neural decoding and (2) object detection from multi-view images, such as those acquired from unmanned aerial vehicles (UAVs).To address the challenges of efficient deep learning systems, we apply dataflow-based methods for design and implementation of signal and information processing systems. Signal-processing oriented dataflow concepts provide an efficient computational model that allows flexibility and expandability to facilitate design and implementation of complex signal and information processing systems. In dataflow modeling, applications are modeled as directed graphs, called dataflow graphs, in which vertices (actors) correspond to discrete computations that are executed and edges represent communication between pairs of actors. In the first part of the thesis, we study in depth a recently-introduced model of computation, called passive-active flow graphs (PAFGs), which can be used in conjunction with dataflow modeling to facilitate more efficient implementation of dataflow graphs. In the second part of the thesis, we present the application of dataflow techniques to develop a novel system for real-time neural decoding. Neural decoding involves processing signals acquired from the brain — for example, through calcium imaging technology — to predict behavior variables. We refer to the developed system as the Neuron Detection and Signal Extraction Platform (NDSEP). NDSEP incorporates streamlined subsystems for neural decoding that are integrated efficiently through dataflow modeling. The dataflow- based software architecture of NDSEP provides modularity and extensibility to experiment with alternative modules for neural signal processing. Our system design also facilitates optimization of trade-offs between accuracy and real-time performance. Additionally, we explore various factors beyond dataflow-based system design to develop efficient deep learning systems for embedded computer vision and signal processing. In the third part of the thesis, we address the problem of limited training data, which is a significant problem for many application areas of embedded computer vision, especially areas that are highly specialized or are at the very forefront of computer vision technology. We address this problem specifically in the context of deep learning for object detection from multi-view images acquired from unmanned aerial vehicles (UAVs). To help overcome the shortage of relevant training data in this class of object detection scenarios, we introduce a new dataset and associated metadata, which integrates real and synthetic data to provide a much larger collection of labeled data than what is available from real data alone. We also apply the developed dataset to conduct comprehensive studies of how the critical attributes of UAV-based images affect machine learning models, and how these insights can be applied to advance the training and testing of the models. Moreover, in the fourth part of the thesis, we explore fundamental algorithm devel- opment for efficient object detection from multi-view images. In this work, we propose a simplified 2-dimensional object detection technique that can be implemented to leverage multiple images for a scene. This work provides a simple but effective way to extend the detection architecture for a single-view image to an architecture for multi-view images. A useful feature of the proposed approach is that it requires only a minimal amount of additional computation to extend an architecture from single- to multi-view operation. In the fifth part of the thesis, we develop a novel approach to online learning, called RONDO (Recursive Online Neural DecOding (RONDO)) framework, that is tailored for portable neural decoding systems, where computational resource constraints and energy efficiency are important concerns in addition to knowledge extraction accuracy. The characteristics of brain imaging signals may change significantly over time, making online learning an important tool for robust neural decoding. In online learning, the under- lying machine learning model is updated dynamically as new input is received by the system. In this work, we build upon the existing understanding gained from recurrent neural network (RNN) algorithms, and introduce a new RNN-based online learning framework for neural decoding that provides robust, energy-energy efficient neural decoding on resource-constrained platforms. RONDO provides novel trade-offs between neural decoding accuracy and energy consumption due to computationally-intensive retraining rounds that are needed to update the underlying RNN model when characteristics of the input signal change significantly.