DEEP LEARNING ENSEMBLES FOR LIGHTWEIGHT OBJECT DETECTION

Loading...
Thumbnail Image

Publication or External Link

Date

2023

Citation

Abstract

Object detection, the task of identifying and localizing important objectswithin an image frame, is a critical task in automation, surveillance, and safety applications. Further, developments in lightweight sensor technologies, improved small-scale computing, and the widespread accessibility of well-labeled data have enabled numerous applications for object detection on inexpensive or low-power hardware. Many applications, such as self-driving and unmanned aerial vehicles, must process sensor data as it arrives (in real-time) using onboard hardware (at- the-edge) in order to continually inform systems such as navigation. Additionally, detection must be often achieved on platforms with limited Size, Weight, and Power (SWaP) since advanced computer hardware may not be possible to place nearby the sensor. This presents a unique challenge: how can we best provide accurate real-time object detection on limited SWaP systems while maintaining low power and computational cost?

A widespread approach for detection is using deep-learning. An object de-tection network is trained on a labeled dataset of images containing known objects and their location. After training, the network may be used to infer on new data, providing both bounding boxes and class identifiers for each box. Popular single- shot detectors have been demonstrated to achieve real-time performance on some systems while having acceptable detection accuracy.

An ensemble is a system comprised of several detectors. In theory, detectorswith architectural differences, ones trained on different data, or detectors given different augmented data at inference time will discover and detect different features of an image. Unifying the results of several different detectors has been demonstrated to improve the detection performance of the ensemble compared to the performance of any component network at the expense of additional computational cost. Further, systems using an ensemble of detectors have been shown to be good solutions to object detection problems in limited SWaP applications such as surveillance and search-and-rescue.

Unlike tasks such as classification, where the output of a network describes theentire input, object detection is concerned both with localization and classification of one or multiple objects in an image. Two different bounding boxes for partially occluded objects may overlap, or highly similar bounding boxes may describe the same object. As a result, unifying the results of object detector networks is far more difficult than unifying classifier networks. Current works typically accomplish this by applying strategies that iteratively combine bounding boxes by overlap. However, little comparative study has been done to determine the effectiveness of these approaches.

This thesis builds on current methods of ensembling object detector networksusing novel approaches to combine bounding boxes. We first introduce current methods for ensembling and a dataflow-based framework for efficient, scalable com- putation of ensembles of detectors. We then contribute a novel method for ensem- bling and implement a practical system for scalable detection using an elastic neural network.

Notes

Rights