Efficient Detection of Objects and Faces with Deep Learning

Najibi, Mahyar

Efficient Detection of Objects and Faces with Deep Learning

dc.contributor.advisor	Davis, Larry S.	en_US
dc.contributor.author	Najibi, Mahyar	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2021-07-01T05:30:53Z
dc.date.available	2021-07-01T05:30:53Z
dc.date.issued	2020	en_US
dc.description.abstract	Object detection is a fundamental problem in computer vision and is an essential building block for many applications such as autonomous driving, visual search, and object tracking. Given its large-scale and real-time applications, scalable training and fast inference are critical. Deep neural networks, although powerful in visual recognition, can be computationally expensive. Besides, they introduce shortcomings such as lack of scale-invariance and inaccurate predictions in crowded scenes that can affect detection. This dissertation studies the intrinsic problems which emerge when deep convolutional neural networks are used for object and face detection. We introduce methods to overcome these issues which are not only accurate but also efficient. First, we focus on the problem of lack of scale-invariance. Performing inference on a multi-scale image pyramid, although effective, increases computation noticeably. Moreover, multi-scale inference really blooms when the model is also trained using expensive multi-scale approaches. As a result, we start by introducing an efficient multi-scale training algorithm called "SNIPER" (Scale Normalization for Image Pyramids with Efficient Re-sampling). Based on the ground-truth annotations, SNIPER sparsely samples high-resolution image regions wherever needed. In contrast to training, at inference, there is no ground-truth information to guide region sampling. Thus, we propose "AutoFocus". AutoFocus predicts regions to be zoomed-in from low resolutions at inference time, making it possible to skip a large portion of the input pyramid. While being as efficient as single-scale detectors, these methods boost performance noticeably. Second, we study the problem of efficient face detection. Compared to generic objects, faces are rigid and crowded scenes containing hundreds of faces with extreme scales are more common. In this dissertation, we present "SSH" (Single Stage Headless Face Detector). A method that unlike two-stage localization/classification detectors, performs both tasks in a single stage, efficiently models scale variation by design, and removes most of the parameters from its underlying network, but still achieves state-of-the-art results on challenging benchmarks. Furthermore, for the two-stage detection paradigm, we introduce "FA-RPN" (Floating Anchor Region Proposal Network). FA-RPN takes the spatial structure of faces into account and allows modification of the prediction density during inference to efficiently deal with crowded scenes. Finally, we turn our attention to the first step in two-stage localization/classification detectors. While neural networks were deployed for classification, localization was previously solved using classic algorithms which became the bottleneck. To remedy, we propose "G-CNN" which models localization as a search in the space of all possible bounding boxes and deploys the same neural network used for classification. Furthermore, for tasks such as saliency detection, where the number of predictions is typically small, we develop an alternative approach that runs at speeds close to 120 frames/second.	en_US
dc.identifier	https://doi.org/10.13016/g6qg-dkfy
dc.identifier.uri	http://hdl.handle.net/1903/27189
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pquncontrolled	Computer Vision	en_US
dc.subject.pquncontrolled	Deep Learning	en_US
dc.subject.pquncontrolled	Face Detection	en_US
dc.subject.pquncontrolled	Object Detection	en_US
dc.title	Efficient Detection of Objects and Faces with Deep Learning	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Najibi_umd_0117E_20760.pdf
Size:: 39.75 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations