Simulation-Based Algorithms for Markov Decision Processes

Thumbnail Image


PhD_2002-9.pdf (985.39 KB)
No. of downloads: 1290

Publication or External Link







Problems of sequential decision making under uncertainty are common inmanufacturing, computer and communication systems, and many such problems canbe formulated as Markov Decision Processes (MDPs). Motivated by a capacityexpansion and allocation problem in semiconductor manufacturing, we formulatea fab-level decision making problem using a finite-horizon transient MDPmodel that can integrate life cycle dynamics of the fab and provide atrade-off between immediate and future benefits and costs.

However, for large and complicated systems formulated as MDPs, the classicalmethodology to compute optimal policies, dynamic programming, suffers fromthe so-called "curse of dimensionality" (computational requirementincreases exponentially with number of states /controls) and "curse ofmodeling" (an explicit model for the cost structure and/or the transitionprobabilities is not available).

In problem settings to which our approaches apply, instead of the explicittransition probabilities, outputs are available from either a simulationmodel or from the actual system. Our methodology is first to find thestructure of optimal policies for some special cases, and then to use thestructure to construct parameterized heuristic policies for more generalcases and implement simulation-based algorithms to determine parameters ofthe heuristic policies. For the fab-level decision-making problem, we analyzethe structure of the optimal policy for a special "one-machine,two-product" case, and discuss the applicability of simulation-basedalgorithms.

We develop several simulation-based algorithms for MDPs to overcome thedifficulties of the "curse of dimensionality" and the "curse of modeling,"considering both theoretical and practical issues.

First, we develop asimulation-based policy iteration algorithm for average cost problems under aunichain assumption, relaxing the common recurrent state assumption.

Second,for weighted cost problems, we develop a new two-timescale simulation-basedgradient algorithms based on perturbation analysis, provide a theoreticalconvergence proof, and compare it with two recently proposed simulation-basedgradient algorithms.

Third, we propose two new Simultaneous PerturbationStochastic Approximation (SPSA) algorithms for weighted cost problems andverify their effectiveness via simulation; then, we consider a general SPSAalgorithm for function minimization and show its convergence under a weakerassumption: the function does not have to be differentiable.