Institute for Systems Research
Permanent URI for this communityhttp://hdl.handle.net/1903/4375
Browse
Search Results
Item Simulation-Based Algorithms for Markov Decision Processes(2002) He, Ying; Marcus, Steven I.; Fu, Michael; ISRProblems of sequential decision making under uncertainty are common inmanufacturing, computer and communication systems, and many such problems canbe formulated as Markov Decision Processes (MDPs). Motivated by a capacityexpansion and allocation problem in semiconductor manufacturing, we formulatea fab-level decision making problem using a finite-horizon transient MDPmodel that can integrate life cycle dynamics of the fab and provide atrade-off between immediate and future benefits and costs.However, for large and complicated systems formulated as MDPs, the classicalmethodology to compute optimal policies, dynamic programming, suffers fromthe so-called "curse of dimensionality" (computational requirementincreases exponentially with number of states /controls) and "curse ofmodeling" (an explicit model for the cost structure and/or the transitionprobabilities is not available).
In problem settings to which our approaches apply, instead of the explicittransition probabilities, outputs are available from either a simulationmodel or from the actual system. Our methodology is first to find thestructure of optimal policies for some special cases, and then to use thestructure to construct parameterized heuristic policies for more generalcases and implement simulation-based algorithms to determine parameters ofthe heuristic policies. For the fab-level decision-making problem, we analyzethe structure of the optimal policy for a special "one-machine,two-product" case, and discuss the applicability of simulation-basedalgorithms.
We develop several simulation-based algorithms for MDPs to overcome thedifficulties of the "curse of dimensionality" and the "curse of modeling,"considering both theoretical and practical issues.
First, we develop asimulation-based policy iteration algorithm for average cost problems under aunichain assumption, relaxing the common recurrent state assumption.
Second,for weighted cost problems, we develop a new two-timescale simulation-basedgradient algorithms based on perturbation analysis, provide a theoreticalconvergence proof, and compare it with two recently proposed simulation-basedgradient algorithms.
Third, we propose two new Simultaneous PerturbationStochastic Approximation (SPSA) algorithms for weighted cost problems andverify their effectiveness via simulation; then, we consider a general SPSAalgorithm for function minimization and show its convergence under a weakerassumption: the function does not have to be differentiable.
Item Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study(2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISRIn this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.Item Simulation-Based Approach for Semiconductor Fab-Level Decision Making - Implementation Issues(2000) He, Ying; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISRIn this paper, we discuss implementation issues of applying a simulation-based approach to asemiconductor fab-level decision making problem. The fab-level decision making problem isformulated as a Markov Decision Process (MDP). We intend to use a simulation-based approach sinceit can break the "curse of dimensionality" and the "curse of modeling" for an MDP with largestate and control spaces. We focus on how to parameterize the state space and the control space.