Simulation-Based Algorithms for Markov Decision Processes

dc.contributor.advisorMarcus, Steven I.en_US
dc.contributor.advisorFu, Michaelen_US
dc.contributor.authorHe, Yingen_US
dc.contributor.departmentISRen_US
dc.date.accessioned2007-05-23T10:12:27Z
dc.date.available2007-05-23T10:12:27Z
dc.date.issued2002en_US
dc.description.abstractProblems of sequential decision making under uncertainty are common inmanufacturing, computer and communication systems, and many such problems canbe formulated as Markov Decision Processes (MDPs). Motivated by a capacityexpansion and allocation problem in semiconductor manufacturing, we formulatea fab-level decision making problem using a finite-horizon transient MDPmodel that can integrate life cycle dynamics of the fab and provide atrade-off between immediate and future benefits and costs.<p>However, for large and complicated systems formulated as MDPs, the classicalmethodology to compute optimal policies, dynamic programming, suffers fromthe so-called "curse of dimensionality" (computational requirementincreases exponentially with number of states /controls) and "curse ofmodeling" (an explicit model for the cost structure and/or the transitionprobabilities is not available).<p>In problem settings to which our approaches apply, instead of the explicittransition probabilities, outputs are available from either a simulationmodel or from the actual system. Our methodology is first to find thestructure of optimal policies for some special cases, and then to use thestructure to construct parameterized heuristic policies for more generalcases and implement simulation-based algorithms to determine parameters ofthe heuristic policies. For the fab-level decision-making problem, we analyzethe structure of the optimal policy for a special "one-machine,two-product" case, and discuss the applicability of simulation-basedalgorithms.<p>We develop several simulation-based algorithms for MDPs to overcome thedifficulties of the "curse of dimensionality" and the "curse of modeling,"considering both theoretical and practical issues. <p>First, we develop asimulation-based policy iteration algorithm for average cost problems under aunichain assumption, relaxing the common recurrent state assumption. <p>Second,for weighted cost problems, we develop a new two-timescale simulation-basedgradient algorithms based on perturbation analysis, provide a theoreticalconvergence proof, and compare it with two recently proposed simulation-basedgradient algorithms. <p>Third, we propose two new Simultaneous PerturbationStochastic Approximation (SPSA) algorithms for weighted cost problems andverify their effectiveness via simulation; then, we consider a general SPSAalgorithm for function minimization and show its convergence under a weakerassumption: the function does not have to be differentiable.en_US
dc.format.extent1009040 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/6291
dc.language.isoen_USen_US
dc.relation.ispartofseriesISR; PhD 2002-9en_US
dc.subjectNext-Generation Product Realization Systemsen_US
dc.titleSimulation-Based Algorithms for Markov Decision Processesen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PhD_2002-9.pdf
Size:
985.39 KB
Format:
Adobe Portable Document Format