Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Item
    A Model Reference Adaptive Search Algorithm for Global Optimization
    (2005) Hu, Jiaqiao; Fu, Michael C.; Marcus, Steven I.; mfu; ISR
  • Thumbnail Image
    Item
    Evolutionary Policy Iteration for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCN
    We propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.
  • Thumbnail Image
    Item
    Risk-Sensitive Probability for Markov Chains
    (2002) Ramezani, Vahid R.; Marcus, Steven I.; Marcus, Steven I.; ISR
    The probability distribution of a Markov chain is viewed as the information state of an additive optimization problem. This optimization problem is then generalized to a product form whose information state gives rise to a generalized notion of probability distributionfor Markov chains. The evolution and the asymptoticbehavior of this generalized or "risk-sensitive"probability distribution is studied in this paper and a conjecture isproposed regarding the asymptotic periodicity of risk-sensitive probability. The relation between a set of simultaneous non-linear equations and the set of periodic attractors is analyzed.

  • Thumbnail Image
    Item
    An Adaptive Sampling Algorithm for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISR
    Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.
  • Thumbnail Image
    Item
    Markov Games: Receding Horizon Approach
    (2001) Chang, Hyeong Soo; Marcus, Steven I.; ISR
    We consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria.

    We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy.

    We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.

    From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given $epsilon > 0$, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by $epsilon$.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon.

    We finally discuss practical implementations of the receding horizon approaches via simulation.

  • Thumbnail Image
    Item
    Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study
    (2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
    In this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.
  • Thumbnail Image
    Item
    Simulation-Based Approach for Semiconductor Fab-Level Decision Making - Implementation Issues
    (2000) He, Ying; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
    In this paper, we discuss implementation issues of applying a simulation-based approach to asemiconductor fab-level decision making problem. The fab-level decision making problem isformulated as a Markov Decision Process (MDP). We intend to use a simulation-based approach sinceit can break the "curse of dimensionality" and the "curse of modeling" for an MDP with largestate and control spaces. We focus on how to parameterize the state space and the control space.