Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • A. James Clark School of Engineering
    • Institute for Systems Research Technical Reports
    • View Item
    •   DRUM
    • A. James Clark School of Engineering
    • Institute for Systems Research Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Simulation-Based Algorithms for Markov Decision Processes

    Thumbnail
    View/Open
    PhD_2002-9.pdf (985.3Kb)
    No. of downloads: 1281

    Date
    2002
    Author
    He, Ying
    Advisor
    Marcus, Steven I.
    Fu, Michael
    Metadata
    Show full item record
    Abstract
    Problems of sequential decision making under uncertainty are common inmanufacturing, computer and communication systems, and many such problems canbe formulated as Markov Decision Processes (MDPs). Motivated by a capacityexpansion and allocation problem in semiconductor manufacturing, we formulatea fab-level decision making problem using a finite-horizon transient MDPmodel that can integrate life cycle dynamics of the fab and provide atrade-off between immediate and future benefits and costs.<p>However, for large and complicated systems formulated as MDPs, the classicalmethodology to compute optimal policies, dynamic programming, suffers fromthe so-called "curse of dimensionality" (computational requirementincreases exponentially with number of states /controls) and "curse ofmodeling" (an explicit model for the cost structure and/or the transitionprobabilities is not available).<p>In problem settings to which our approaches apply, instead of the explicittransition probabilities, outputs are available from either a simulationmodel or from the actual system. Our methodology is first to find thestructure of optimal policies for some special cases, and then to use thestructure to construct parameterized heuristic policies for more generalcases and implement simulation-based algorithms to determine parameters ofthe heuristic policies. For the fab-level decision-making problem, we analyzethe structure of the optimal policy for a special "one-machine,two-product" case, and discuss the applicability of simulation-basedalgorithms.<p>We develop several simulation-based algorithms for MDPs to overcome thedifficulties of the "curse of dimensionality" and the "curse of modeling,"considering both theoretical and practical issues. <p>First, we develop asimulation-based policy iteration algorithm for average cost problems under aunichain assumption, relaxing the common recurrent state assumption. <p>Second,for weighted cost problems, we develop a new two-timescale simulation-basedgradient algorithms based on perturbation analysis, provide a theoreticalconvergence proof, and compare it with two recently proposed simulation-basedgradient algorithms. <p>Third, we propose two new Simultaneous PerturbationStochastic Approximation (SPSA) algorithms for weighted cost problems andverify their effectiveness via simulation; then, we consider a general SPSAalgorithm for function minimization and show its convergence under a weakerassumption: the function does not have to be differentiable.
    URI
    http://hdl.handle.net/1903/6291
    Collections
    • Institute for Systems Research Technical Reports

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility