Institute for Systems Research Technical Reports

Permanent URI for this collectionhttp://hdl.handle.net/1903/4376

This archive contains a collection of reports generated by the faculty and students of the Institute for Systems Research (ISR), a permanent, interdisciplinary research unit in the A. James Clark School of Engineering at the University of Maryland. ISR-based projects are conducted through partnerships with industry and government, bringing together faculty and students from multiple academic departments and colleges across the university.

Browse

Search Results

Now showing 1 - 10 of 45
  • Thumbnail Image
    Item
    Coloring Rooted Subtrees on a Bounded Degree Host Tree
    (2007) Rawat, Anuj; Shayman, Mark; La, Richard J.; Marcus, Steven I.
    We consider a rooted tree R to be a rooted subtree of a given tree T if the tree obtained by replacing the directed arcs of R by undirected edges is a subtree of T. In this work, we study the problem of assigning colors to a given set of rooted subtrees of a given host tree such that if any two rooted subtrees share a directed edge, then they are assigned different colors. The objective is to minimize the total number of colors used in the coloring. The problem is NP hard even in the case when the degree of the host tree is restricted to 3. This problem is motivated by the problem of assigning wavelengths to multicast traffic requests in all-optical tree networks. We present a greedy coloring scheme for the case when the degree of the host tree is restricted to 3 and prove that it is a 5/2-approximation algorithm. We then present another simpler coloring scheme and prove that it is an approximation algorithm for the problem with approximation ratio 10/3, 3 and 2 for the cases when the degree of the host tree is restricted to 4, 3, and 2, respectively.
  • Thumbnail Image
    Item
    Solving Continuous-State POMDPs via Density Projection
    (2007) Zhou, Enlu; Fu, Michael C.; Marcus, Steven I.
    Research on numerical solution methods for partially observable Markov decision processes (POMDPs) has primarily focused on discrete-state models, and these algorithms do not generally extend to continuous-state POMDPs, due to the infinite dimensionality of the belief space. In this paper, we develop a computationally viable and theoretically sound method for solving continuous-state POMDPs by effectively reducing the dimensionality of the belief space via density projections. The density projection technique is also incorporated into particle filtering to provide a filtering scheme for online decision making. We provide an error bound between the value function induced by the policy obtained by our method and the true value function of the POMDP, and also an error bound between the projection particle filtering and the optimal filtering. Finally, we illustrate the effectiveness of our method through an inventory control problem.
  • Thumbnail Image
    Item
    A Model Reference Adaptive Search Algorithm for Global Optimization
    (2005) Hu, Jiaqiao; Fu, Michael C.; Marcus, Steven I.; mfu; ISR
  • Thumbnail Image
    Item
    An Evolutionary Random Search Algorithm for Solving Markov Decision Processes
    (2005) Hu, Jiaqiao; Fu, Michael C.; Ramezani, Vahid; Marcus, Steven I.; Fu, Michael; ISR
  • Thumbnail Image
    Item
    An Asymptotically Efficient Algorithm for Finite Horizon Stochastic Dynamic Programming Problems
    (2003) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; Fu, Michael C.; Marcus, Steven I.; ISR
    We present a novel algorithm, called ``Simulated Annealing Multiplicative Weights", for approximately solving large finite-horizon stochastic dynamic programming problems. The algorithm is ``asymptotically efficient" in the sense that a finite-time bound for the sample mean of the optimal value function over a given finite policy space can be obtained, and the bound approaches the optimal value as the number of iterations increases.The algorithm updates a probability distribution over the given policy space with a very simple rule, and the sequence of distributions generated by the algorithm converges to a distribution concentrated only on the optimal policies for the given policy space. We also discuss how to reduce the computational cost of the algorithm to apply it in practice.
  • Thumbnail Image
    Item
    Evolutionary Policy Iteration for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCN
    We propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.
  • Thumbnail Image
    Item
    Risk-Sensitive Probability for Markov Chains
    (2002) Ramezani, Vahid R.; Marcus, Steven I.; Marcus, Steven I.; ISR
    The probability distribution of a Markov chain is viewed as the information state of an additive optimization problem. This optimization problem is then generalized to a product form whose information state gives rise to a generalized notion of probability distributionfor Markov chains. The evolution and the asymptoticbehavior of this generalized or "risk-sensitive"probability distribution is studied in this paper and a conjecture isproposed regarding the asymptotic periodicity of risk-sensitive probability. The relation between a set of simultaneous non-linear equations and the set of periodic attractors is analyzed.

  • Thumbnail Image
    Item
    An Adaptive Sampling Algorithm for Solving Markov Decision Processes
    (2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISR
    Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.
  • Thumbnail Image
    Item
    Multi-time Scale Markov Decision Processes
    (2002) Chang, Hyeong Soo; Fard, Pedram; Marcus, Steven I.; Shayman, Mark; Marcus, Steven I.; Shayman, Mark; ISR
    This paper proposes a simple analytical model called M time-scale MarkovDecision Process (MMDP) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different time-scales.

    In this model, the state space and the control space ofeach level in the hierarchy are non-overlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower time-scale) state and/or the state will affect the evolutionary decision making process of the lower level m+1 (faster time-scale) until a new decision is made at the higher level but the lower level decisions themselves do not affect the higher level's transition dynamics. The performance produced by the lower level's decisions will affect the higher level's decisions.

    A hierarchical objective function is defined such that the finite-horizon value of following a (nonstationary) policy at the level m+1 over a decision epoch of the level m plus an immediate reward at the level m is the single step reward for the level m decision making process. From this we define "multi-level optimal value function" and derive "multi-level optimality equation."

    We discuss how to solve MMDPs exactly or approximately and also study heuristic on-line methods to solve MMDPs. Finally, we give some example control problems that can be modeled as MMDPs.

  • Thumbnail Image
    Item
    Markov Games: Receding Horizon Approach
    (2001) Chang, Hyeong Soo; Marcus, Steven I.; ISR
    We consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria.

    We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy.

    We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.

    From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given $epsilon > 0$, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by $epsilon$.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon.

    We finally discuss practical implementations of the receding horizon approaches via simulation.