Browsing by Author "Fu, Michael C."
Now showing 1 - 20 of 27
Results Per Page
Sort Options
Item An Adaptive Sampling Algorithm for Solving Markov Decision Processes(2002) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; ISRBased on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.Item Application of Perturbation Analysis to the Design and Analysis of Control Charts(1997) Fu, Michael C.; Hu, Jian-Qiang; ISRThe design of control charts in statistical quality control addresses the optimal selection of the design parameters such as the sampling frequency and the control limits; and includes sensitivity analysis with respect to system parameters such as the various process parameters and the economic costs of sampling. The advent of more complicated control chart schemes has necessitated the use of Monte Carlo simulation in the design process, particularly in the evaluation of performance measures such as average run length. In this paper, we apply perturbation analysis to derive gradient estimators that can be used in gradient-based optimization algorithms and in sensitivity analysis when Monte Carlo simulation is employed. We illustrate the technique on a simple Shewhart control chart and on a more complicated control chart that includes the exponentially- weighted moving average control chart as a special case.Item Approximate Policy Iteration for Semiconductor Fab-Level Decision Making - a Case Study(2000) He, Ying; Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; Marcus, Steven I.; ISRIn this paper, we propose an approximate policy iteration (API) algorithm for asemiconductor fab-level decision making problem. This problem is formulated as adiscounted cost Markov Decision Process (MDP), and we have applied exact policy iterationto solve a simple example in prior work. However, the overwhelmingcomputational requirements of exact policy iteration prevent its application forlarger problems. Approximate policy iteration overcomes this obstacle by approximating thecost-to-go using function approximation. Numerical simulation on the same example showsthat the proposed API algorithm leads to a policy with cost close to that of the optimalpolicy.Item An Asymptotically Efficient Algorithm for Finite Horizon Stochastic Dynamic Programming Problems(2003) Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.; Fu, Michael C.; Marcus, Steven I.; ISRWe present a novel algorithm, called ``Simulated Annealing Multiplicative Weights", for approximately solving large finite-horizon stochastic dynamic programming problems. The algorithm is ``asymptotically efficient" in the sense that a finite-time bound for the sample mean of the optimal value function over a given finite policy space can be obtained, and the bound approaches the optimal value as the number of iterations increases.The algorithm updates a probability distribution over the given policy space with a very simple rule, and the sequence of distributions generated by the algorithm converges to a distribution concentrated only on the optimal policies for the given policy space. We also discuss how to reduce the computational cost of the algorithm to apply it in practice.Item Comparing Gradient Estimation Methods Applied to Stochastic Manufacturing Systems(2000) Mellacheruvu, Praveen V.; Fu, Michael C.; Herrmann, Jeffrey W.; ISRThis paper compares two gradient estimation methods that can be usedfor estimating the sensitivities of output metrics with respectto the input parameters of a stochastic manufacturing system.A brief description of the methods used currently is followedby a description of the two methods: the finite difference methodand the simultaneous perturbation method. While the finitedifference method has been in use for a long time, simultaneousperturbation is a relatively new method which has beenapplied with stochastic approximation for optimizationwith good results. The methods described are used to analyzea stochastic manufacturing system and estimate gradients.The results are compared to the gradients calculated fromanalytical queueing system models.These gradient methods are of significant use in complex manufacturingsystems like semiconductor manufacturing systems where we havea large number of input parameters which affect the average total cycle time.These gradient estimation methods can estimate the impact thatthese input parameters have and identify theparameters that have the maximum impact on system performance.
Item Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming(2005) Fu, Michael C.; Jin, Xing; Fu, Michael C.; ISRWe consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, we derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associated with estimation of the value (cost-to-go) function itself. These results have practical implications for Monte Carlo simulation-based solution approaches to stochastic dynamic programming problems where it is impractical to extract the explicit transition probabilities of the underlying system model.Item A Distributed Algorithm for Solving a Class of Multi-agent Markov Decision Problems(2003) Chang, Hyeong Soo; Fu, Michael C.; Fu, Michael C.; ISRWe consider a class of infinite horizon Markov decision processes (MDPs) with multiple decision makers, called agents,and a general joint reward structure, but a special decomposable state/action structure such that each individual agent's actions affect the system's state transitions independently from the actions of all other agents. We introduce the concept of ``localization," where each agent need only consider a ``local" MDP defined on its own state and action spaces. Based on this localization concept, we propose an iterative distributed algorithm that emulates gradient ascent and which converges to a locally optimal solution for the average reward case. The solution is an ``autonomous" joint policy such that each agent's action is based on only its local state. Finally, we discuss the implication of the localization concept for discounted reward problems.Item Evolutionary Policy Iteration for Solving Markov Decision Processes(2002) Chang, Hyeong Soo; Lee, Hong-Gi; Fu, Michael C.; Marcus, Steven I.; ISR; CSHCNWe propose a novel algorithm called Evolutionary Policy Iteration (EPI) for solving infinite horizon discounted reward Markov Decision Process (MDP) problems. EPI inherits the spirit of the well-known PI algorithm but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population is monotonically improved with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy for a given MDP. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.Item An Evolutionary Random Search Algorithm for Solving Markov Decision Processes(2005) Hu, Jiaqiao; Fu, Michael C.; Ramezani, Vahid; Marcus, Steven I.; Fu, Michael; ISRItem Gradient Estimation for Queues with Non-identical Servers(1993) Fu, Michael C.; Hu, Jian-Qiang; Nagi, Rakesh; ISRWe consider a single-queue system with multiple servers that are non-identical. Our interest is in applying the technique of perturbation analysis to estimate derivatives of mean steady- state system time. Because infinitesimal perturbation analysis yields biased estimates for this problem, we apply smoothed perturbation analysis to get unbiased estimators. In the most general cases, the estimators require additional simulation, so we propose an approximation to eliminate this. For two servers, we give an analytical proof of unbiasedness in steady state for the Markovian case. We provide simulation results for both Markovian and non-Markovian examples, and compare the performance with regenerative likelihood ratio estimators.Item Gradient Estimation of Two-Stage Continuous Transfer Lines Subject to Operation-Dependent Failures(1998) Fu, Michael C.; Xie, Xiaolan; ISRThis paper addresses the gradient estimation of transfer linescomprising two machines separated by a buffer of finite capacity. A continuous flow model is considered, where machines are subject tooperation-dependent failures, i.e., a machine cannot fail when it is idle. Both repair times and failure times may be general, i.e., they need not be exponentially distributed.The system is hybrid in the sense that it hasboth continuous dynamics, as a result of continuous material flow, and discrete events: failures and repairs. The purpose of this paper is to estimate the gradient of the throughput rate with respect to the buffer capacity. Both IPA estimators and SPA estimators are derived. Simulation results show that IPA estimators do not work, contradicting the common belief that IPA always works for continuous flow models.Item A Large Deviations Analysis of Quantile Estimation with Application to Value at Risk(2005-07-01T12:31:49Z) Jin, Xing; Fu, Michael C.Quantile estimation has become increasingly important, particularly in the financial industry, where Value-at-Risk has emerged as a standard measurement tool for controlling portfolio risk. In this paper we apply the theory of large deviations to analyze various simulation-based quantile estimators. First, we show that the coverage probability of the standard quantile estimator converges to one exponentially fast with sample size. Then we introduce a new quantile estimator that has a provably faster convergence rate. Furthermore, we show that the coverage probability for this new estimator can be guaranteed to be 100% with sufficiently large, but finite, sample size. Numerical experiments on a VaR example illustrate the potential for dramatic variance reduction.Item A Lower Bounding Result for the Optimal Policy in an Adaptive Staffing Problem(1998) Assad, Arjang A.; Fu, Michael C.; Yoo, Jae-sin; ISRWe derive a lower bound for the staffing levels required to meet a projected load in a retail service facility. We model the queueing system as a Markovian process with non-homogeneous Poisson arrivals. Motivated by an application from the postal services, we assume that the arrival rate is piecewise constant over the time horizon and retain such transient effects as build- up in the system. The optimal staffing decision is formulated as a multiperiod dynamic programming problem where staff is allocated to each time period to minimize the total costs over the horizon. The main result is the derivation of a lower bound on the staffing requirements that is computed by decoupling successive time periods.Item Minimizing Work-in-Process and Material Handling in the Facilities Layout Problem(1995) Fu, Michael C.; Kaku, Bharat K.; ISRWe consider the plant layout problem for a job shop environment. This problem is generally treated as a quadratic assignment problem with the objective of minimizing material handling costs. In this paper we investigate conditions under which the quadratic assignment solution also minimizes average work-in-process. To get some initial insights, we model the plant as an open queueing network and show that under a certain set of assumptions, the problem of minimizing work-in-process reduces to the quadratic assignment problem. We then investigate via simulation the robustness of this result. We found that the relationship between material handling costs and average work-in-process holds under much more general conditions than are assumed in the analytical model. For those cases where the result does not hold, we have refined the model by developing a simple secondary measure which appears to work well in practice.Item A Model Reference Adaptive Search Algorithm for Global Optimization(2005) Hu, Jiaqiao; Fu, Michael C.; Marcus, Steven I.; mfu; ISRItem Monotone Optimal Policies for a Transient Queueing Staffing Problem(1997) Fu, Michael C.; Marcus, Steven I.; Wang, I-Jeng; ISRWe consider the problem of determining the optimal policy for staffing a queueing system over multiple periods, using a model that takes into account transient queueing effects. Formulating the problem in a dynamic programming setting, we show that the optimal policy follows a monotone optimal control by establishing the submodularity of the objective function with respect to the staffing level and initial queue size in a period. In particular, this requires proving that the system occupancy in a G/M/s queue is submodular in the number of servers and initial system occupancy.Item Multi-Echelon Models for Repairable Items: A Review(2005-07-01T12:31:37Z) Diaz, Angel; Fu, Michael C.We review multi-echelon inventory models for repairable items. Such models have been widely applied to the management of critical spare parts for military equipment for around three decades, but the application to manufacturing and service industries seems to be much less documented. We feel that the appropriate use of models in the management of spare parts for heavily utilized equipment in industry can result in significant cost savings, in particular in those settings where repair facilities are resource constrained. In our review, we provide a strategic framework for making these decisions, place the modeling problem in the broader context of inventory control, and review the prominent models in the literature under a unified setting, highlighting some key relationships. We concentrate on describing those models which we feel are most applicable for practical application, revisiting in detail the Multi-Echelon Technique for Recoverable Item Control (METRIC) model and its variations, and then discussing a variety of more general queueing models. We then discuss the components which we feel must be addressed in the models in order to apply them practically to industrial settings.Item Online Appendix for “Ranking and Selection as Stochastic Control”(2017-04) Peng, Yijie; Chong, Edwin K. P.; Chen, Chun-Hung; Fu, Michael C.Item Optimal Multilevel Feedback Policies for ABR Flow Control using Two Timescale SPSA(1999) Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; ISROptimal multilevel feedback control policies for rate based flow controlin available bit rate (ABR) service in asynchronous transfer mode (ATM)networks are obtained in the presence of information and propagationdelays, using a numerically efficient two timescale simultaneousperturbation stochastic approximation (SPSA) algorithm. Convergenceanalysis of the algorithm is presented. Numerical experiments demonstratefast convergence even in the presence of significant delays and largenumber of parametrized policy levels.Item Optimal Multilevel Feedback Policies for ABR Flow Control using Two Timescale SPSA(1999) Bhatnagar, Shalabh; Fu, Michael C.; Marcus, Steven I.; ISROptimal multilevel control policies for rate based flow control in available bit rate (ABR) service in asynchronous transfer mode (ATM) networks are obtained in the presence of information and propagation delays, using a numerically efficient two timescale simultaneous perturbation stochastic approximation (SPSA) algorithm. Numerical experiments demonstrate fast convergence even in the presence of significant delays and a large number of parametrized parameter levels.