Markov Games: Receding Horizon Approach
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
We consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria.
We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy.
We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.
From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given $epsilon > 0$, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by $epsilon$.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon.
We finally discuss practical implementations of the receding horizon approaches via simulation.