Browsing by Author "Marcus, Steven I."
Results Per Page
Sort Options
Item An Adaptive Sampling Algorithm for Solving Markov Decision Processes(2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISRBased on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.Item Analysis of an Adaptive Control Scheme for a Partially Observed Controlled Markov Chain(1991) Fernandez-Gaucherand, Emmanuel; Arapostathis, Aristotle; Marcus, Steven I.; ISRWe consider an adaptive finite state controlled Markov chain with partial state information, motivated by a class of replacement problems. We present parameter estimation techniques based on the information available after actions that reset the state to known value are taken. We prove that the parameter estimates converge w.p. 1 to the true (unknown) parameter, under the feedback structure induced by a certainty equivalent adaptive policy. We also show that the adaptive policy is self- optimizing, in a long-run average sense, for any (measurable) sequence of parameter estimates converging w.p. 1 to the true parameter.Item Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study(2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISRIn this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.Item Approximate Receding Horizon Approach for Markov Decision Processes: Average Reward Case(2001) Chang, Hyeong Soo; Marcus, Steven I.; ISRBuilding on the receding horizon approach by Hernandez-Lerma andLasserre in solving Markov decision processes (MDPs),this paper first analyzes the performance of the (approximate) receding horizon approach in terms of infinite horizon average reward.In this approach, we fix a finite horizon and at each decision time, we solve the given MDP with the finite horizon for an approximately optimal current action and take the action to control the MDP.
We then analyze recently proposed on-line policy improvementscheme, "roll-out," by Bertsekas and Castanon, and a generalization of the rollout algorithm, "parallel rollout" by Chang et al., in terms of the infinite horizon average reward in the framework of the (approximate) receding horizon control.
We finally discuss practical implementations of these schemes via simulation.
Item An Asymptotically Efficient Algorithm for Finite Horizon Stochastic Dynamic Programming Problems(2003) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; Fu, Michael C.; Marcus, Steven I.; ISRWe present a novel algorithm, called ``Simulated Annealing Multiplicative Weights", for approximately solving large finite-horizon stochastic dynamic programming problems. The algorithm is ``asymptotically efficient" in the sense that a finite-time bound for the sample mean of the optimal value function over a given finite policy space can be obtained, and the bound approaches the optimal value as the number of iterations increases.The algorithm updates a probability distribution over the given policy space with a very simple rule, and the sequence of distributions generated by the algorithm converges to a distribution concentrated only on the optimal policies for the given policy space. We also discuss how to reduce the computational cost of the algorithm to apply it in practice.Item Coloring Rooted Subtrees on a Bounded Degree Host Tree(2007) Rawat, Anuj; Shayman, Mark; La, Richard J.; Marcus, Steven I.We consider a rooted tree R to be a rooted subtree of a given tree T if the tree obtained by replacing the directed arcs of R by undirected edges is a subtree of T. In this work, we study the problem of assigning colors to a given set of rooted subtrees of a given host tree such that if any two rooted subtrees share a directed edge, then they are assigned different colors. The objective is to minimize the total number of colors used in the coloring. The problem is NP hard even in the case when the degree of the host tree is restricted to 3. This problem is motivated by the problem of assigning wavelengths to multicast traffic requests in all-optical tree networks. We present a greedy coloring scheme for the case when the degree of the host tree is restricted to 3 and prove that it is a 5/2-approximation algorithm. We then present another simpler coloring scheme and prove that it is an approximation algorithm for the problem with approximation ratio 10/3, 3 and 2 for the cases when the degree of the host tree is restricted to 4, 3, and 2, respectively.Item Controlled Markov Processes on the Infinite Planning Horizon: Weighted and, Overtaking Cost Criteria(1993) Fernandez-Gaucherand, Emmanuel; Ghosh, Mrinal K.; Marcus, Steven I.; ISRStochastic control problems for controlled Markov processes models with an infinite planning horizon are considered, under some non-standard cost criteria. The classical discounted and average cost criteria can be viewed as complementary, in the sense that the former captures the short-time and the latter the long-time performance of the system. Thus, we study a cost criterion obtained as weighted combinations of these criteria, extending to a general state and control space framework several recent results by Feinberg and Shwartz, and by Krass et al. In addition, a functional characterization is given for overtaking optimal policies, for problems with countable state spaces and compact control spaces; our approach is based on qualitative properties of the optimality equation for problems with an average cost criterion.Item Design of Protocol Converters: A Discrete Event Systems Approach(1995) Kumar, Ratnesh; Nelvagal, S.; Marcus, Steven I.; ISRA protocol mismatch occurs when heterogeneous networks try to communicate with each other. Such mismatches are inevitable due to the proliferation of a multitude of networking architectures, hardware and software on one hand, and the need for global connectivity on the other hand. Global standardization of protocols will avoid such problems, but it may take years to be agreed upon, leaving communication problems for the present. So the alternative solution of protocol conversion has been proposed. In this paper we present a systematic approach to protocol conversion using the recent theory of supervisory control of discrete event systems. We study the problem of designing a converter for a given mismatched pair of protocols, using their specifications and the specifications for the channel and the user services. We introduce the notion of converter languages, use it obtain a necessary and sufficient condition for the existence of protocol converter and present an effective algorithm for computing it whenever it exists.Item A Discrete Event Systems Approach for Protocol Conversion(1997) Kumar, Ratnesh; Nelvagal, S.; Marcus, Steven I.; ISRA Protocol mismatch occurs when heterogeneous networks try to communicate with each other. Such mismatches are inevitable due to the proliferation of a multitude of networking architectures, hardware, and software on one hand, and the need for global connectivity on the other hand. In order to circumvent this problem the solution of protocol conversion has been proposed. In this paper we present a systematic approach to protocol conversion using the theory of supervisory control of discrete event systems, which was partially first addressed by Inan. We study the problem of designing a converter for a given mismatched pair of protocols, using their specifications, and the specifications for the channel and the user services. We introduce the notion of converter languages and use it to obtain a necessary and sufficient condition for the existence of protocol converter and present an effective algorithm for computing it whenever it exists.Item Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey(1991) Arapostathis, Aristotle; Borkar, Vivek S.; Fernandez-Gaucherand, Emmanuel; Ghosh, Mrinal K.; Marcus, Steven I.; ISRThis work is a survey of the average cost control problem for discrete-time Markov processes. We have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. Our exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. We have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. We have also identified several important questions which are still left open to investigation.Item Ergodic Control of Switching Diffusions(1992) Ghosh, Mrinal K.; Arapostathis, Aristotle; Marcus, Steven I.; ISRWe study the ergodic control problem of switching diffusions representing a typical hybrid system that arises in numerous applications such as fault tolerant control systems, flexible manufacturing systems, etc. Under certain conditions, we establish the existence of a stable Markov nonrandomized policy which is almost surely optimal for a pathwise longrun average cost criterion. We then study the corresponding Hamilton-Jacobi- Bellman (HJB) equation and establish the existence of a unique solution in a certain class. Using this, we characterize the optimal policy as a minimizing selector of the Hamiltonian associated with the HJB equations. We apply these results to a failure prone manufacturing system and show that the optimal production rate is of the hedging point type.Item Estimation of Hidden Markov Models(2001) Ramezani, Vahid Reza; Marcus, Steven I.; ISRA risk-sensitive generalization of the Maximum A Posterior Probability (MAP) estimationfor partially observed Markov chains is presented.Using a change of measure technique,a cascade filtering scheme for the risk-sensitivestate estimation is introduced. Structural results,the influence of the availability of information, mixing and non-mixingdynamics, and the connection with other risk-sensitive estimation methodsare considered. A qualitative analysis of the samplepaths clarifies the underlying mechanism.Item Evolutionary Policy Iteration for Solving Markov Decision Processes(2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCNWe propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.Item An Evolutionary Random Search Algorithm for Solving Markov Decision Processes(2005) Hu, Jiaqiao; Fu, Michael C.; Ramezani, Vahid; Marcus, Steven I.; Fu, Michael; ISRItem Existence of Risk Sensitive Optimal Stationary Policies for Controlled Markov Processes(1997) Hernandez-Hernandez, Daniel; Marcus, Steven I.; ISRIn this paper we are concerned with the existence of optimal stationary policies for infinite horizon risk sensitive Markov control processes with denumerable state space, unbounded cost function, and long run average cost. Introducing a discounted cost dynamic game, we prove that its value function satisfies an Isaacs equation, and its relationship with the risk sensitive control problem is studied. Using the vanishing discount approach, we prove that the risk-sensitive dynamic programming inequality holds, and derive an optimal stationary policy.Item Extension Based Limited Lookahead Supervision of Discrete Event Systems(1995) Kumar, Ratnesh; Cheung, Hok M.; Marcus, Steven I.; ISRSupervisory control of discrete event systems using limited lookahead has been studied by Chung-Lafortune-Lin, where control is computed by truncating the plant behavior up to the limited lookahead window. We present a different approach in which the control is computed by extending the plant behavior by arbitrary traces beyond the limited lookahead window. The proposed supervisor avoids the notion of pending traces. Consequently the need for considering either a conservative or an optimistic attitude regarding pending traces (as in the work of Chung- Lafortune-Lin) does not arise. It was shown that an optimistic attitude may result in violation of the desired specifications. We demonstrate here that a conservative attitude may result in a restrictive control policy by showing that in some cases the proposed supervisor is less restrictive than the conservative attitude-based supervisor. Moreover, the proposed approach uses the notion of relative closure to construct the supervisor so that it is non-blocking even when the desired behavior is not relative closed (Chung-Lafortune-Lin assume relative closure). Finally, the proposed supervisor possesses all the desirable properties that a conservative attitude based supervisor of Chung-Lafortune-Lin possesses. We illustrate our approach by applying it to concurrency control in database management systems.Item Finite Buffer Realization of Input-Output Discrete Event Systems(1994) Kumar, Ratnesh; Garg, Vijay K.; Marcus, Steven I.; ISRMany discrete event systems (DESs) such as manufacturing systems, data base management systems, communication networks, traffic systems, etc., can be modeled as input-output discrete event systems (I/O DESs). In this paper we formulate and study the problem of stable realization of such systems in the logical setting. Given an input and an output language describing the sequences of events that occur at the input and the output, respectively, of an I/O DES, we study whether it is possible to realize the system as a unit consisting of a given set of buffers of finite capacity, called a dispatching unit. The notions of stable, conditionally stable, dispatchable and conditionally dispatchable units are introduced as existence of stable (or input-output bounded), and causal (or prefix preserving) input- output maps, and effectively computable necessary and sufficient conditions for testing them are obtained.Item A Framework for Mixed Estimation of Hidden Markov Models(1998) Dey, Subhrakanti; Marcus, Steven I.; ISRIn this paper, we present a framework for a mixed estimationscheme for hidden Markov models (HMM).A robust estimation scheme is first presented using the minimax method thatminimizes a worst case cost for HMMs with bounded uncertainties.Then we present a mixed estimation scheme that minimizes arisk-neutral cost with a constraint on the worst-case cost. Somesimulation results are also presented to compare these different estimationschemes in cases of uncertainties in the noise model.The research and scientific content in this material has been accepted for presentation in the 37th IEEE Conference on Decision and Control, Tampa, December 1998. Item Harnack's Inequality for Cooperative Weakly Coupled Elliptic Systems(1997) Arapostathis, Aristotle; Ghosh, Mrinal K.; Marcus, Steven I.; ISRWe consider cooperative, uniformly elliptic systems, with bounded coefficients and coupling in the zeroth-order terms. We establish two analogues of Harnack's inequality for this class of systems. A weak version is obtained under fairly general conditions, while imposing an irreducibility condition on the coupling coefficients we obtain a stronger version of the inequality. This irreducibility condition is also necessary for the existience of a Harnack constant for this class of systems. A Harnack inequality is also obtained for a class of superharmonic functions.Item Markov Games: Receding Horizon Approach(2001) Chang, Hyeong Soo; Marcus, Steven I.; ISRWe consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria.We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy.
We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.
From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given $epsilon > 0$, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by $epsilon$.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon.
We finally discuss practical implementations of the receding horizon approaches via simulation.
- «
- 1 (current)
- 2
- 3
- »