Browsing by Author "Chang, Hyeong Soo"
Results Per Page
Sort Options
Item An Adaptive Sampling Algorithm for Solving Markov Decision Processes(2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISRBased on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.Item Approximate Receding Horizon Approach for Markov Decision Processes: Average Reward Case(2001) Chang, Hyeong Soo; Marcus, Steven I.; ISRBuilding on the receding horizon approach by Hernandez-Lerma andLasserre in solving Markov decision processes (MDPs),this paper first analyzes the performance of the (approximate) receding horizon approach in terms of infinite horizon average reward.In this approach, we fix a finite horizon and at each decision time, we solve the given MDP with the finite horizon for an approximately optimal current action and take the action to control the MDP.
We then analyze recently proposed on-line policy improvementscheme, "roll-out," by Bertsekas and Castanon, and a generalization of the rollout algorithm, "parallel rollout" by Chang et al., in terms of the infinite horizon average reward in the framework of the (approximate) receding horizon control.
We finally discuss practical implementations of these schemes via simulation.
Item An Asymptotically Efficient Algorithm for Finite Horizon Stochastic Dynamic Programming Problems(2003) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; Fu, Michael C.; Marcus, Steven I.; ISRWe present a novel algorithm, called ``Simulated Annealing Multiplicative Weights", for approximately solving large finite-horizon stochastic dynamic programming problems. The algorithm is ``asymptotically efficient" in the sense that a finite-time bound for the sample mean of the optimal value function over a given finite policy space can be obtained, and the bound approaches the optimal value as the number of iterations increases.The algorithm updates a probability distribution over the given policy space with a very simple rule, and the sequence of distributions generated by the algorithm converges to a distribution concentrated only on the optimal policies for the given policy space. We also discuss how to reduce the computational cost of the algorithm to apply it in practice.Item A Distributed Algorithm for Solving a Class of Multi-agent Markov Decision Problems(2003) Chang, Hyeong Soo; Fu, Michael C.; Fu, Michael C.; ISRWe consider a class of infinite horizon Markov decision processes (MDPs) with multiple decision makers, called agents,and a general joint reward structure, but a special decomposable state/action structure such that each individual agent's actions affect the system's state transitions independently from the actions of all other agents. We introduce the concept of ``localization," where each agent need only consider a ``local" MDP defined on its own state and action spaces. Based on this localization concept, we propose an iterative distributed algorithm that emulates gradient ascent and which converges to a locally optimal solution for the average reward case. The solution is an ``autonomous" joint policy such that each agent's action is based on only its local state. Finally, we discuss the implication of the localization concept for discounted reward problems.Item Evolutionary Policy Iteration for Solving Markov Decision Processes(2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCNWe propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.Item Markov Games: Receding Horizon Approach(2001) Chang, Hyeong Soo; Marcus, Steven I.; ISRWe consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria.We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy.
We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.
From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given $epsilon > 0$, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by $epsilon$.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon.
We finally discuss practical implementations of the receding horizon approaches via simulation.
Item Multi-time Scale Markov Decision Processes(2002) Chang, Hyeong Soo; Fard, Pedram; Marcus, Steven I.; Shayman, Mark; Marcus, Steven I.; Shayman, Mark; ISRThis paper proposes a simple analytical model called M time-scale MarkovDecision Process (MMDP) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different time-scales.In this model, the state space and the control space ofeach level in the hierarchy are non-overlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower time-scale) state and/or the state will affect the evolutionary decision making process of the lower level m+1 (faster time-scale) until a new decision is made at the higher level but the lower level decisions themselves do not affect the higher level's transition dynamics. The performance produced by the lower level's decisions will affect the higher level's decisions.
A hierarchical objective function is defined such that the finite-horizon value of following a (nonstationary) policy at the level m+1 over a decision epoch of the level m plus an immediate reward at the level m is the single step reward for the level m decision making process. From this we define "multi-level optimal value function" and derive "multi-level optimality equation."
We discuss how to solve MMDPs exactly or approximately and also study heuristic on-line methods to solve MMDPs. Finally, we give some example control problems that can be modeled as MMDPs.
Item On Convergence of Evolutionary Computation for Stochastic Combinatorial Optimization(2009-09) Chang, Hyeong SooExtending Rudolph's works on the convergence analysis of evolutionary computation (EC) for deterministic combinatorial optimization problems (COPs), this brief paper establishes a probability one convergence of some variants of explicit-averaging EC to an optimal solution and the optimal value for solving stochastic COPs.Item A Short Note on Combining Multiple Policies in Risk-Sensitive Exponential Average Reward Markov Decision Processes(2009) Chang, Hyeong SooThis short note presents a method of combining multiple policies in a given policy set such that the resulting policy improves all policies in the set for risk-sensitive exponential average reward Markov decision processes (MDPs), extending the work of Howard and Matheson for the singleton policy set case. Some applications of the method in solving risk-sensitive MDPs are also discussed.