Simulation-Based Algorithms for Markov Decision Processes

He, Ying

Simulation-Based Algorithms for Markov Decision Processes

dc.contributor.advisor	Marcus, Steven I.	en_US
dc.contributor.advisor	Fu, Michael	en_US
dc.contributor.author	He, Ying	en_US
dc.contributor.department	ISR	en_US
dc.date.accessioned	2007-05-23T10:12:27Z
dc.date.available	2007-05-23T10:12:27Z
dc.date.issued	2002	en_US
dc.description.abstract	Problems of sequential decision making under uncertainty are common inmanufacturing, computer and communication systems, and many such problems canbe formulated as Markov Decision Processes (MDPs). Motivated by a capacityexpansion and allocation problem in semiconductor manufacturing, we formulatea fab-level decision making problem using a finite-horizon transient MDPmodel that can integrate life cycle dynamics of the fab and provide atrade-off between immediate and future benefits and costs.<p>However, for large and complicated systems formulated as MDPs, the classicalmethodology to compute optimal policies, dynamic programming, suffers fromthe so-called "curse of dimensionality" (computational requirementincreases exponentially with number of states /controls) and "curse ofmodeling" (an explicit model for the cost structure and/or the transitionprobabilities is not available).<p>In problem settings to which our approaches apply, instead of the explicittransition probabilities, outputs are available from either a simulationmodel or from the actual system. Our methodology is first to find thestructure of optimal policies for some special cases, and then to use thestructure to construct parameterized heuristic policies for more generalcases and implement simulation-based algorithms to determine parameters ofthe heuristic policies. For the fab-level decision-making problem, we analyzethe structure of the optimal policy for a special "one-machine,two-product" case, and discuss the applicability of simulation-basedalgorithms.<p>We develop several simulation-based algorithms for MDPs to overcome thedifficulties of the "curse of dimensionality" and the "curse of modeling,"considering both theoretical and practical issues. <p>First, we develop asimulation-based policy iteration algorithm for average cost problems under aunichain assumption, relaxing the common recurrent state assumption. <p>Second,for weighted cost problems, we develop a new two-timescale simulation-basedgradient algorithms based on perturbation analysis, provide a theoreticalconvergence proof, and compare it with two recently proposed simulation-basedgradient algorithms. <p>Third, we propose two new Simultaneous PerturbationStochastic Approximation (SPSA) algorithms for weighted cost problems andverify their effectiveness via simulation; then, we consider a general SPSAalgorithm for function minimization and show its convergence under a weakerassumption: the function does not have to be differentiable.	en_US
dc.format.extent	1009040 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/6291
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	ISR; PhD 2002-9	en_US
dc.subject	Next-Generation Product Realization Systems	en_US
dc.title	Simulation-Based Algorithms for Markov Decision Processes	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PhD_2002-9.pdf
Size:: 985.39 KB
Format:: Adobe Portable Document Format

Download

Collections

Institute for Systems Research Technical Reports