Now showing items 1-3 of 3
Risk-Sensitive Probability for Markov Chains
The probability distribution of a Markov chain is viewed as the information state of an additive optimization problem. This optimization problem is then generalized to a product form whose information state gives rise to ...
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite ...
Evolutionary Policy Iteration for Solving Markov Decision Processes
We propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but ...