USING CNNS TO UNDERSTAND LIGHTING WITHOUT REAL LABELED TRAINING DATA

dc.contributor.advisorJacobs, David W.en_US
dc.contributor.authorZhou, Haoen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2019-09-27T05:41:59Z
dc.date.available2019-09-27T05:41:59Z
dc.date.issued2019en_US
dc.description.abstractThe task of computer vision is to make computers understand the physical word through images. Lighting is the medium through which we capture images of the physical world. Without lighting, there is no image, and dierent lighting leads to dierent images of the same physical world. In this dissertation, we study how to understand lighting from images. With the emergence of large datasets and deep learning in recent years, learning based methods play a more and more important role in computer vision, and deep Convolutional Neural Networks (CNNs) now dominate most of the problems in computer vision. Despite their success, deep CNNs are notorious for their data hungry nature compared with traditional learning based methods. While collecting images from the internet is easy and fast, labeling those images is both time consuming and expensive, and sometimes, even impossible. In this work, we focus on understanding lighting from faces and natural scenes, in which ground truth labels of the lighting are impossible to achieve. As a preliminary topic, we rst study the capacity of deep CNNs. Designing deep CNNs with less capacity and good generalization is one way to reduce the amount of labeled data needed in training deep CNNs, and understanding the capacity of deep CNNs is the rst step towards that goal. In this work, we empirically study the capacity of deep CNNs by studying the redundancy of parameters in them. More specically, we aim at optimizing the number of neurons in a network, thus the number of parameters. To achieve that goal, we incorporate sparse constraints into the objective function and apply a forward-backward splitting method to solve this sparse constrained optimization problem eciently. The proposed method can signicantly reduce the number of parameters, showing that networks with small capacity can work well. We then study an important problem in computer vision: inverse lighting from a single face image. Lacking massive ground truth lighting labels, we generate a large amount of synthetic data with ground truth lighting to train a deep network. However, due to the large domain gap between real and synthetic data, the network trained using synthetic data cannot generalize well to real data. We thus propose to use real data to train the deep CNN together with synthetic data. We apply an existing method to estimate lighting conditions of real face images. However, these lighting labels are noisy. We then propose a Label Denoising Adversarial Network (LDAN) to make use of these synthetic data to help train a deep CNN to regress lighting from real face images, denoising labels of real images. We have shown that the proposed method can generate more consistent lighting for faces taken under the same lighting condition. Third, we study how to relight a face image using deep CNNs. We formulate this problem as a supervised image to image translation problem. Due to the lack of a "in the wild" face dataset that is suitable for this task, we apply a physically based face relighting method to generate a large scale, high resolution, "in the wild" portrait relighting dataset (DPR). A deep Convolutional Neural Network (CNN) is then trained using this dataset to generate a relighted portrait image by taking a source image and a target lighting as input. We show that our training procedure can regularize the generated results, removing the artifacts caused by physically-based relighting methods. Fourth, we study how to understand lighting from a natural scene based on an RGB image. We propose a Global-Local Spherical Harmonics (GLoSH) lighting model to improve the lighting representation, and jointly predict refectance and surface normals. The global SH models the holistic lighting while local SHs account for the spatial variation of lighting. A novel non-negative lighting constraint is proposed to encourage the estimated SHs to be physically meaningful. To seamlessly make use of the GLoSH model, we design a coarse-to-ne network structure. Lacking labels for refectance and lighting, we apply synthetic data for model pre-training and fine-tune the model with real data in a self-supervised way. We have shown that the proposed method outperforms state-of-the-art methods in understanding lighting, refectance and shading of a natural scene.en_US
dc.identifierhttps://doi.org/10.13016/not9-3pcx
dc.identifier.urihttp://hdl.handle.net/1903/25051
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledDeep CNNsen_US
dc.subject.pquncontrolledLightingen_US
dc.subject.pquncontrolledSpherical Harmonicsen_US
dc.subject.pquncontrolledSynthetic dataen_US
dc.titleUSING CNNS TO UNDERSTAND LIGHTING WITHOUT REAL LABELED TRAINING DATAen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhou_umd_0117E_20294.pdf
Size:
80.16 MB
Format:
Adobe Portable Document Format