Efficient Detection of Objects and Faces with Deep Learning

dc.contributor.advisorDavis, Larry S.en_US
dc.contributor.authorNajibi, Mahyaren_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2021-07-01T05:30:53Z
dc.date.available2021-07-01T05:30:53Z
dc.date.issued2020en_US
dc.description.abstractObject detection is a fundamental problem in computer vision and is an essential building block for many applications such as autonomous driving, visual search, and object tracking. Given its large-scale and real-time applications, scalable training and fast inference are critical. Deep neural networks, although powerful in visual recognition, can be computationally expensive. Besides, they introduce shortcomings such as lack of scale-invariance and inaccurate predictions in crowded scenes that can affect detection. This dissertation studies the intrinsic problems which emerge when deep convolutional neural networks are used for object and face detection. We introduce methods to overcome these issues which are not only accurate but also efficient. First, we focus on the problem of lack of scale-invariance. Performing inference on a multi-scale image pyramid, although effective, increases computation noticeably. Moreover, multi-scale inference really blooms when the model is also trained using expensive multi-scale approaches. As a result, we start by introducing an efficient multi-scale training algorithm called "SNIPER" (Scale Normalization for Image Pyramids with Efficient Re-sampling). Based on the ground-truth annotations, SNIPER sparsely samples high-resolution image regions wherever needed. In contrast to training, at inference, there is no ground-truth information to guide region sampling. Thus, we propose "AutoFocus". AutoFocus predicts regions to be zoomed-in from low resolutions at inference time, making it possible to skip a large portion of the input pyramid. While being as efficient as single-scale detectors, these methods boost performance noticeably. Second, we study the problem of efficient face detection. Compared to generic objects, faces are rigid and crowded scenes containing hundreds of faces with extreme scales are more common. In this dissertation, we present "SSH" (Single Stage Headless Face Detector). A method that unlike two-stage localization/classification detectors, performs both tasks in a single stage, efficiently models scale variation by design, and removes most of the parameters from its underlying network, but still achieves state-of-the-art results on challenging benchmarks. Furthermore, for the two-stage detection paradigm, we introduce "FA-RPN" (Floating Anchor Region Proposal Network). FA-RPN takes the spatial structure of faces into account and allows modification of the prediction density during inference to efficiently deal with crowded scenes. Finally, we turn our attention to the first step in two-stage localization/classification detectors. While neural networks were deployed for classification, localization was previously solved using classic algorithms which became the bottleneck. To remedy, we propose "G-CNN" which models localization as a search in the space of all possible bounding boxes and deploys the same neural network used for classification. Furthermore, for tasks such as saliency detection, where the number of predictions is typically small, we develop an alternative approach that runs at speeds close to 120 frames/second.en_US
dc.identifierhttps://doi.org/10.13016/g6qg-dkfy
dc.identifier.urihttp://hdl.handle.net/1903/27189
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pquncontrolledComputer Visionen_US
dc.subject.pquncontrolledDeep Learningen_US
dc.subject.pquncontrolledFace Detectionen_US
dc.subject.pquncontrolledObject Detectionen_US
dc.titleEfficient Detection of Objects and Faces with Deep Learningen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Najibi_umd_0117E_20760.pdf
Size:
39.75 MB
Format:
Adobe Portable Document Format