Synthesis of Strategies for Non-Zero-Sum Repeated Games
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
There are numerous applications that involve two or more self-interested autonomous agents that repeatedly interact with each other in order to achieve a goal or maximize their utilities. This dissertation focuses on the problem of how to identify and exploit useful structures in agents' behavior for the construction of good strategies for agents in multi-agent environments, particularly non-zero-sum repeated games. This dissertation makes four contributions to the study of this problem. First, this thesis describes a way to take a set of interaction traces produced by different pairs of players in a two-player repeated game, and then find the best way to combine them into a strategy. The strategy can then be incorporated into an existing agent, as an enhancement of the agent's original strategy. In cross-validated experiments involving 126 agents for the Iterated Prisoner's Dilemma, Iterated Chicken Game, and Iterated Battle of the Sexes, my technique was able to make improvement to the performance of nearly all of the agents. Second, this thesis investigates the issue of uncertainty about goals when a goal-based agent situated in a nondeterministic environment. The results of this investigation include the necessary and sufficiency conditions for such guarantee, and an algorithm for synthesizing a strategy from interaction traces that maximizes the probability of success of an agent even when no strategy can assure the success of the agent. Third, this thesis introduces a technique, Symbolic Noise Detection (SND), for detecting noise (i.e., mistakes or miscommunications) among agents in repeated games. The idea is that if we can build a model of the other agent's behavior, we can use this model to detect and correct actions that have been affected by noise. In the 20th Anniversary Iterated Prisoner's Dilemma competition, the SND agent placed third in the "noise" category, and was the best performer among programs that had no "slave" programs feeding points to them. Fourth, the thesis presents a generalization of SND that can be wrapped around any existing strategy. Finally, the thesis includes a general framework for synthesizing strategies from experience for repeated games in both noisy and noisy-free environments.