Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 10 of 22
  • Thumbnail Image
    Item
    Solving Continuous-State POMDPs via Density Projection
    (2007) Zhou, Enlu; Fu, Michael C.; Marcus, Steven I.
    Research on numerical solution methods for partially observable Markov decision processes (POMDPs) has primarily focused on discrete-state models, and these algorithms do not generally extend to continuous-state POMDPs, due to the infinite dimensionality of the belief space. In this paper, we develop a computationally viable and theoretically sound method for solving continuous-state POMDPs by effectively reducing the dimensionality of the belief space via density projections. The density projection technique is also incorporated into particle filtering to provide a filtering scheme for online decision making. We provide an error bound between the value function induced by the policy obtained by our method and the true value function of the POMDP, and also an error bound between the projection particle filtering and the optimal filtering. Finally, we illustrate the effectiveness of our method through an inventory control problem.
  • Thumbnail Image
    Item
    Stochastic Gradient Estimation
    (2005) Fu, Michael C.; ISR
    We consider the problem of efficiently estimating gradients from stochastic simulation. Although the primary motivation is their use in simulation optimization, the resulting estimators can also be useful in other ways, e.g., sensitivity analysis. The main approaches described are finite differences (including simultaneous perturbations), perturbation analysis, the likelihood ratio/score function method, and the use of weak derivatives.
  • Thumbnail Image
    Item
    Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming
    (2005) Fu, Michael C.; Jin, Xing; Fu, Michael C.; ISR
    We consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, we derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associated with estimation of the value (cost-to-go) function itself. These results have practical implications for Monte Carlo simulation-based solution approaches to stochastic dynamic programming problems where it is impractical to extract the explicit transition probabilities of the underlying system model.
  • Thumbnail Image
    Item
    A Model Reference Adaptive Search Algorithm for Global Optimization
    (2005) Hu, Jiaqiao; Fu, Michael C.; Marcus, Steven I.; mfu; ISR
  • Thumbnail Image
    Item
    An Evolutionary Random Search Algorithm for Solving Markov Decision Processes
    (2005) Hu, Jiaqiao; Fu, Michael C.; Ramezani, Vahid; Marcus, Steven I.; Fu, Michael; ISR
  • Thumbnail Image
    Item
    An Asymptotically Efficient Algorithm for Finite Horizon Stochastic Dynamic Programming Problems
    (2003) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; Fu, Michael C.; Marcus, Steven I.; ISR
    We present a novel algorithm, called ``Simulated Annealing Multiplicative Weights", for approximately solving large finite-horizon stochastic dynamic programming problems. The algorithm is ``asymptotically efficient" in the sense that a finite-time bound for the sample mean of the optimal value function over a given finite policy space can be obtained, and the bound approaches the optimal value as the number of iterations increases.The algorithm updates a probability distribution over the given policy space with a very simple rule, and the sequence of distributions generated by the algorithm converges to a distribution concentrated only on the optimal policies for the given policy space. We also discuss how to reduce the computational cost of the algorithm to apply it in practice.
  • Thumbnail Image
    Item
    A Distributed Algorithm for Solving a Class of Multi-agent Markov Decision Problems
    (2003) Chang, Hyeong Soo; Fu, Michael C.; Fu, Michael C.; ISR
    We consider a class of infinite horizon Markov decision processes (MDPs) with multiple decision makers, called agents,and a general joint reward structure, but a special decomposable state/action structure such that each individual agent's actions affect the system's state transitions independently from the actions of all other agents. We introduce the concept of ``localization," where each agent need only consider a ``local" MDP defined on its own state and action spaces. Based on this localization concept, we propose an iterative distributed algorithm that emulates gradient ascent and which converges to a locally optimal solution for the average reward case. The solution is an ``autonomous" joint policy such that each agent's action is based on only its local state. Finally, we discuss the implication of the localization concept for discounted reward problems.
  • Thumbnail Image
    Item
    Evolutionary Policy Iteration for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCN
    We propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.
  • Thumbnail Image
    Item
    An Adaptive Sampling Algorithm for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISR
    Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.
  • Thumbnail Image
    Item
    Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study
    (2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
    In this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.