Institute for Systems Research Technical Reports

Search Results

Now showing 1 - 5 of 5

A Model Reference Adaptive Search Algorithm for Global Optimization
(2005) Hu, Jiaqiao; Fu, Michael C.; Marcus, Steven I.; mfu; ISR
Evolutionary Policy Iteration for Solving Markov Decision Processes
(2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCN
We propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
(2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISR
Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.
Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study
(2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
In this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.
Simulation-Based Approach for Semiconductor Fab-Level Decision Making - Implementation Issues
(2000) He, Ying; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISR
In this paper, we discuss implementation issues of applying a simulation-based approach to asemiconductor fab-level decision making problem. The fab-level decision making problem isformulated as a Markov Decision Process (MDP). We intend to use a simulation-based approach sinceit can break the "curse of dimensionality" and the "curse of modeling" for an MDP with largestate and control spaces. We focus on how to parameterize the state space and the control space.

Institute for Systems Research Technical Reports

Browse

Filters

Settings

Sort By

Results per page

Search Results