Learning Along the Edge of Deep Neural Networks
MetadataShow full item record
While Deep Neural Networks (DNNs) have recently achieved impressive results on many classification tasks, it is still unclear why they perform so well and how to properly design them. It has been observed that while training and testing deep networks, some ideal conditions need to be met in order to achieve impressive performance. In particular, an abundance of training samples is required. These training samples should be lossless, perfectly labeled, and spanning various classes in a balanced way. A lot of empirical results suggest that deviating from such ideal conditions can severely affect the performance of DNNs. In this dissertation, we analyze each of these individual conditions to understand their effects on the performance of deep networks. Furthermore, we devise mitigation strategies when the ideal conditions may not be met. We, first, investigate the relationship between the performance of a convolutional neural network (CNN), its depth, and the size of its training set. Designing a CNN is a challenging task and the most common approach to picking the right architecture is to experiment with many parameters until a desirable performance is achieved. We derive performance bounds on CNNs with respect to the network parameters and the size of the available training dataset. We prove a sufficient condition --polynomial in the depth of the CNN-- on the training database size to guarantee such performance. We empirically test our theory on the problem of gender classification and explore the effect of varying the CNN depth, as well as the training distribution and set size. Under i.i.d. sampling of the training set, we show that the incremental benefit of a new training sample decreases exponentially with the training set size. Next, we study the structure of the CNN layers, by examining the convolutional, activation, and pooling layers, and showing a parallelism between this structure and another well-studied problem: Convolutional Sparse Coding (CSC). The sparse representation framework is a popular approach due to its desirable theoretical guarantees and the successful use of sparse representations as feature vectors in machine learning problems. Recently, a connection between CNNs and CSC was established using a simplified CNN model. Motivated by the use of spatial pooling in practical CNN implementations, we investigate the effect of using spatial pooling in the CSC model. We show that the spatial pooling operations do not hinder the performance and can introduce additional benefits. Then, we investigate three of the ideal conditions previously mentioned: the availability of vast amounts of noiseless and balanced training data. We overcome the difficulties resulting from deviating from this ideal scenario by modifying the training sampling strategy. Conventional DNN training algorithms sample training examples in a random fashion. This inherently assumes that, at any point in time, all training samples are equally important to the training process. However, empirical evidence suggests that the training process can benefit from different sampling strategies. Motivated by this objective, we consider the task of adaptively finding optimal training subsets which will be iteratively presented to the DNN. We use convex optimization methods, based on an objective criterion and a quantitative measure of the current performance of the classifier, to efficiently identify informative samples to train on. We propose an algorithm to decompose the optimization problem into smaller per-class problems, which can be solved in parallel. We test our approach on benchmark classification tasks and demonstrate its effectiveness in boosting performance while using even fewer training samples. We also show that our approach can make the classifier more robust in the presence of label noise and class imbalance. Finally, we consider the case where testing (and potentially training) samples are lossy, leading to the well-known compressed sensing framework. We use Generative Adversarial Networks (GANs) to impose structure in compressed sensing problems, replacing the usual sparsity constraint. We propose to train the GANs in a task-aware fashion, specifically for reconstruction tasks. We show that it is possible to train our model without using any (or much) non-compressed data. We also show that the latent space of the GAN carries discriminative information and can further be regularized to generate input features for general inference tasks. We demonstrate the effectiveness of our method on a variety of reconstruction and classification problems.