Multi-Agent Reinforcement Learning: Systems for Evaluation and Applications to Complex Systems

Thumbnail Image


Terry_umd_0117E_23151.pdf (2.51 MB)
No. of downloads:

Publication or External Link





Reinforcement learning is a field of artificial intelligence that studies methods for agents to learn by trial and error to take actions in a given system. Famous examples of it have included learning to control real robots, or achieving superhuman performance in most of the most popular and challenging games for humans.

In order to conduct research in this space, researchers use standardized "environments", such as robotics simulations or video games, to evaluate the performance of learning methods. This thesis covers PettingZoo, a library that offers a standardized API and set of reference environments for multi-agent reinforcement learning that's become widely used, SuperSuit, a library that offers a easy-to-use standardized preprocessing wrappers for interfacing with learning libraries, and extensions to the Arcade Learning Environment (a popular tool which reinforcement learning researchers use to interact with Atari 2600 games) that allows for supporting multiplayer game modes.

Using these tools, this thesis also uses multi-agent reinforcement learning to develop a new tool for natural science research. Emergent behaviors refer to the coordinated behaviors of groups of agents such as pedestrians in a crosswalk, birds in flocking formations, cars in traffic or traders in the stock market, and represent some of the most important things that we generally don't understand across many fields of science. In this work, we introduce the first mathematical formalism for the systematic search of all possible good ("mature") emergent behaviors within a multi-agent system through multi-agent reinforcement learning (MARL), and create a naive implementation of this search via deep reinforcement learning that can be applied in arbitrary environments. We show that in 12 multi-agent systems, this naive method is able to find over a hundred total emergent behaviors, the majority of which were previously unknown to the environment authors. Such methods could allow for answering various types of open scientific questions, such as "What behaviors are possible in this system", "What specific conditions in this system allow for this kind of emergent behavior", or "How can I change this system to prevent this emergent behavior."