Approximate Receding Horizon Approach for Markov Decision Processes: Average Reward Case
Publication or External Link
Building on the receding horizon approach by Hernandez-Lerma andLasserre in solving Markov decision processes (MDPs),this paper first analyzes the performance of the (approximate) receding horizon approach in terms of infinite horizon average reward.
In this approach, we fix a finite horizon and at each decision time, we solve the given MDP with the finite horizon for an approximately optimal current action and take the action to control the MDP.
We then analyze recently proposed on-line policy improvementscheme, "roll-out," by Bertsekas and Castanon, and a generalization of the rollout algorithm, "parallel rollout" by Chang et al., in terms of the infinite horizon average reward in the framework of the (approximate) receding horizon control.
We finally discuss practical implementations of these schemes via simulation.