Control Theory-Inspired Acceleration of the Gradient-Descent Method: Centralized and Distributed

Loading...
Thumbnail Image

Publication or External Link

Date

2022

Citation

Abstract

Mathematical optimization problems are prevalent across various disciplines in science and engineering. Particularly in electrical engineering, convex and non-convex optimization problems are well-known in signal processing, estimation, control, and machine learning research. In many of these contemporary applications, the data points are dispersed over several sources. Restrictions such as industrial competition, administrative regulations, and user privacy have motivated significant research on distributed optimization algorithms for solving such data-driven modeling problems. The traditional gradient-descent method can solve optimization problems with differentiable cost functions. However, the speed of convergence of the gradient-descent method and its accelerated variants is highly influenced by the conditioning of the optimization problem being solved. Specifically, when the cost is ill-conditioned, these methods (i) require many iterations to converge and (ii) are highly unstable against process noise. In this dissertation, we propose novel optimization algorithms, inspired by control-theoretic tools, that can significantly attenuate the influence of the problem's conditioning.

First, we consider solving the linear regression problem in a distributed server-agent network. We propose the Iteratively Pre-conditioned Gradient-Descent (IPG) algorithm to mitigate the deleterious impact of the data points' conditioning on the convergence rate. We show that the IPG algorithm has an improved rate of convergence in comparison to both the classical and the accelerated gradient-descent methods. We further study the robustness of IPG against system noise and extend the idea of iterative pre-conditioning to stochastic settings, where the server updates the estimate based on a randomly selected data point at every iteration. In the same distributed environment, we present theoretical results on the local convergence of IPG for solving convex optimization problems. Next, we consider solving a system of linear equations in peer-to-peer multi-agent networks and propose a decentralized pre-conditioning technique. The proposed algorithm converges linearly, with an improved convergence rate than the decentralized gradient-descent. Considering the practical scenario where the computations performed by the agents are corrupted, or a communication delay exists between them, we study the robustness guarantee of the proposed algorithm and a variant of it. We apply the proposed algorithm for solving decentralized state estimation problems. Further, we develop a generic framework for adaptive gradient methods that solve non-convex optimization problems. Here, we model the adaptive gradient methods in a state-space framework, which allows us to exploit control-theoretic methodology in analyzing Adam and its prominent variants. We then utilize the classical transfer function paradigm to propose new variants of a few existing adaptive gradient methods. Applications on benchmark machine learning tasks demonstrate our proposed algorithms' efficiency. Our findings suggest further exploration of the existing tools from control theory in complex machine learning problems.

The dissertation is concluded by showing that the potential in the previously mentioned idea of IPG goes beyond solving generic optimization problems through the development of a novel distributed beamforming algorithm and a novel observer for nonlinear dynamical systems, where IPG's robustness serves as a foundation in our designs. The proposed IPG for distributed beamforming (IPG-DB) facilitates a rapid establishment of communication links with far-field targets while jamming potential adversaries without assuming any feedback from the receivers, subject to unknown multipath fading in realistic environments. The proposed IPG observer utilizes a non-symmetric pre-conditioner, like IPG, as an approximation of the observability mapping's inverse Jacobian such that it asymptotically replicates the Newton observer with an additional advantage of enhanced robustness against measurement noise. Empirical results are presented, demonstrating both of these methods' efficiency compared to the existing methodologies.

Notes

Rights