Alternating Optimization: Constrained Problems, Adversarial Networks, and Robust Models
Publication or External Link
Data-driven machine learning methods have achieved impressive performance for many industrial applications and academic tasks. Machine learning methods usually have two stages: training a model from large-scale samples, and inference on new samples after the model is deployed. The training of modern models relies on solving difficult optimization problems that involve nonconvex, nondifferentiable objective functions and constraints, which is sometimes slow and often requires expertise to tune hyperparameters. While inference is much faster than training, it is often not fast enough for real-time applications.We focus on machine learning problems that can be formulated as a minimax problem in training, and study alternating optimization methods served as fast, scalable, stable and automated solvers.
First, we focus on the alternating direction method of multipliers (ADMM) for constrained problem in classical convex and nonconvex optimization. Some popular machine learning applications including sparse and low-rank models, regularized linear models, total variation image processing, semidefinite programming, and consensus distributed computing. We propose adaptive ADMM (AADMM), which is a fully automated solver achieving fast practical convergence by adapting the only free parameter in ADMM. We further automate several variants of ADMM (relaxed ADMM, multi-block ADMM and consensus ADMM), and prove convergence rate guarantees that are widely applicable to variants of ADMM with changing parameters. We release the fast implementation for more than ten applications and validate the efficiency with several benchmark datasets for each application. Second, we focus on the minimax problem of generative adversarial networks (GAN). We apply prediction steps to stabilize stochastic alternating methods for the training of GANs, and demonstrate advantages of GAN-based losses for image processing tasks. We also propose GAN-based knowledge distillation methods to train small neural networks for inference acceleration, and empirically study the trade-off between acceleration and accuracy.Third, we present preliminary results on adversarial training for robust models. We study fast algorithms for the attack and defense for universal perturbations, and then explore network architectures to boost robustness.