Pruning for Efficient Deep Learning: From CNNs to Generative Models
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Deep learning models have shown remarkable success in visual recognition and generative modeling tasks in computer vision in the last decade. A general trend is that their performance improves with an increase in the size of their training data, model capacity, and training iterations on modern hardware. However, the increase in model size naturally leads to higher computational complexity and memory footprint, thereby necessitating high-end hardware for their deployment. This trade-off prevents the deployment of deep learning models in resource-constrained environments such as robotic applications, mobile phones, and edge devices employed in the Artificial Internet of Things (AIoT). In addition, private companies and organizations have to spend significant resources on cloud services to serve deep models for their customers. In this dissertation, we develop model pruning and Neural Architecture Search (NAS) methods to improve the inference efficiency of deep learning models for visual recognition and generative modeling applications. We design our methods to be tailored to the unique characteristics of each model and its task.
In the first part, we present model pruning and efficient NAS methods for Convolutional Neural Network (CNN) classifiers. We start by proposing a pruning method that leverages interpretations of a pretrained model's decisions to prune its redundant structures. Then, we provide an efficient NAS method to learn kernel sizes of a CNN model using their training dataset and given a parameter budget for the model, enabling designing efficient CNNs customized for their target application. Finally, we develop a framework for simultaneous pretraining and pruning of CNNs, which combines the first two stage of the pretrain-prune-finetune pipeline commonly used in model pruning and reduces its complexity.
In the second part, we propose model pruning methods for visual generative models. First, we present a pruning method for conditional Generative Adversarial Networks (GANs) in which we prune the generator and discriminator models in a collaborative manner. We then address the inference efficiency of diffusion models by proposing a method that prunes a pretrained diffusion model into a mixture of efficient experts, each handling a separate part of the denoising process. Finally, we develop an adaptive prompt-tailored pruning method for modern text-to-image diffusion models. It prunes a pretrained model like Stable Diffusion into a mixture of efficient experts such that each expert specializes in certain type of input prompts.