Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    A Model Reference Adaptive Search Algorithm for Global Optimization
    (2005) Hu, Jiaqiao; Fu, Michael C.; Marcus, Steven I.; mfu; ISR
  • Thumbnail Image
    Item
    Evolutionary Policy Iteration for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCN
    We propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.
  • Thumbnail Image
    Item
    An Adaptive Sampling Algorithm for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISR
    Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.
  • Thumbnail Image
    Item
    Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study
    (2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
    In this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.
  • Thumbnail Image
    Item
    Simulation-Based Approach for Semiconductor Fab-Level Decision Making - Implementation Issues
    (2000) He, Ying; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
    In this paper, we discuss implementation issues of applying a simulation-based approach to asemiconductor fab-level decision making problem. The fab-level decision making problem isformulated as a Markov Decision Process (MDP). We intend to use a simulation-based approach sinceit can break the "curse of dimensionality" and the "curse of modeling" for an MDP with largestate and control spaces. We focus on how to parameterize the state space and the control space.