Learning in Engineered Multi-agent Systems
Baras, John S
MetadataShow full item record
Consider the problem of maximizing the total power produced by a wind farm. Due to aerodynamic interactions between wind turbines, each turbine maximizing its individual power--as is the case in present-day wind farms--does not lead to optimal farm-level power capture. Further, there are no good models to capture the said aerodynamic interactions, rendering model based optimization techniques ineffective. Thus, model-free distributed algorithms are needed that help turbines adapt their power production on-line so as to maximize farm-level power capture. Motivated by such problems, the main focus of this dissertation is a distributed model-free optimization problem in the context of multi-agent systems. The set-up comprises of a fixed number of agents, each of which can pick an action and observe the value of its individual utility function. An individual's utility function may depend on the collective action taken by all agents. The exact functional form (or model) of the agent utility functions, however, are unknown; an agent can only measure the numeric value of its utility. The objective of the multi-agent system is to optimize the welfare function (i.e. sum of the individual utility functions). Such a collaborative task requires communications between agents and we allow for the possibility of such inter-agent communications. We also pay attention to the role played by the pattern of such information exchange on certain aspects of performance. We develop two algorithms to solve this problem. The first one, engineered Interactive Trial and Error Learning (eITEL) algorithm, is based on a line of work in the Learning in Games literature and applies when agent actions are drawn from finite sets. While in a model-free setting, we introduce a novel qualitative graph-theoretic framework to encode known directed interactions of the form "which agents' action affect which others' payoff" (interaction graph). We encode explicit inter-agent communications in a directed graph (communication graph) and, under certain conditions, prove convergence of agent joint action (under eITEL) to the welfare optimizing set. The main condition requires that the union of interaction and communication graphs be strongly connected; thus the algorithm combines an implicit form of communication (via interactions through utility functions) with explicit inter-agent communications to achieve the given collaborative goal. This work has kinship with certain evolutionary computation techniques such as Simulated Annealing; the algorithm steps are carefully designed such that it describes an ergodic Markov chain with a stationary distribution that has support over states where agent joint actions optimize the welfare function. The main analysis tool is perturbed Markov chains and results of broader interest regarding these are derived as well. The other algorithm, Collaborative Extremum Seeking (CES), uses techniques from extremum seeking control to solve the problem when agent actions are drawn from the set of real numbers. In this case, under the assumption of existence of a local minimizer for the welfare function and a connected undirected communication graph between agents, a result regarding convergence of joint action to a small neighborhood of a local optimizer of the welfare function is proved. Since extremum seeking control uses a simultaneous gradient estimation-descent scheme, gradient information available in the continuous action space formulation is exploited by the CES algorithm to yield improved convergence speeds. The effectiveness of this algorithm for the wind farm power maximization problem is evaluated via simulations. Lastly, we turn to a different question regarding role of the information exchange pattern on performance of distributed control systems by means of a case study for the vehicle platooning problem. In the vehicle platoon control problem, the objective is to design distributed control laws for individual vehicles in a platoon (or a road-train) that regulate inter-vehicle distances at a specified safe value while the entire platoon follows a leader-vehicle. While most of the literature on the problem deals with some inadequacy in control performance when the information exchange is of the nearest neighbor-type, we consider an arbitrary graph serving as information exchange pattern and derive a relationship between how a certain indicator of control performance is related to the information pattern. Such analysis helps in understanding qualitative features of the `right' information pattern for this problem.