Mathematics
Permanent URI for this communityhttp://hdl.handle.net/1903/2261
Browse
2 results
Search Results
Item Simulation Optimization: Efficient Selection and Dynamic Programming(2023) Zhou, Yi; Fu, Michael C; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Many real-world problems can be modeled as stochastic systems, whose performances can be estimated by simulations. Important topics include statistical ranking and selection (R&S) problems, response surface problems, and stochastic estimation problems. The first part of the dissertation focuses on the R&S problems, where there is a finite set of feasible solutions ("systems" or "alternatives") with unknown values of performances that must be estimated from noisy observations, e.g., from expensive stochastic simulation experiments. The goal in R&S problems is to identify the highest-valued alternative as quickly as possible. In many practical settings, alternatives with similar attributes could have similar performances; we thus focus on such settings where prior similarity information between the performance outputs of different alternatives can be learned from data. Incorporating similarity information enables efficient budget allocation for faster identification of the best alternative in sequential selection. Using a new selection criterion, the similarity selection index, we develop two new allocation methods: one based on a mathematical programming characterization of the asymptotically optimal budget allocation, and the other based on a myopic expected improvement measure. For the former, we present a novel sequential implementation that provably learns the optimal allocation without tuning. For the latter, we also derive its asymptotic sampling ratios. We also propose a practical way to update the prior similarity information as new samples are collected. The second part of the dissertation considers the stochastic estimation problems of estimating a conditional expectation. We first formulate the conditional expectation as a ratio of two stochastic derivatives. Applying stochastic gradient estimation techniques, we express the two derivatives using ordinary expectations, which can be then estimated by Monte Carlo methods. Based on an empirical distribution estimated from simulation, we provide guidance on selecting the appropriate formulation of the derivatives to reduce the variance of the ratio estimator. The third part of this dissertation further explores the application of estimators for conditional expectations in policy evaluation, an important topic in stochastic dynamic programming. In particular, we focus on a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.Item Topics in Stochastic Optimization(2019) Sun, Guowei N/A; Fu, Michael C; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In this thesis, we work with three topics in stochastic optimization: ranking and selection (R&S), multi-armed bandits (MAB) and stochastic kriging (SK). For R&S, we first consider the problem of making inferences about all candidates based on samples drawn from one. Then we study the problem of designing efficient allocation algorithms for problems where the selection objective is more complex than the simple expectation of a random output. In MAB, we use the autoregressive process to capture possible temporal correlations in the unknown reward processes and study the effect of such correlations on the regret bounds of various bandit algorithms. Lastly, for SK, we design a procedure for dynamic experimental design for establishing a good global fit by efficiently allocating simulation budgets in the design space. The first two Chapters of the thesis work with variations of the R&S problem. In Chapter 1, we consider the problem of choosing the best design alternative under a small simulation budget, where making inferences about all alternatives from a single observation could enhance the probability of correct selection. We propose a new selection rule exploiting the relative similarity between pairs of alternatives and show its improvement on selection performance, evaluated by the Probability of Correct Selection, compared to selection based on collected sample averages. We illustrate the effectiveness by applying our selection index on simulated R\&S problems using two well-known budget allocation policies. In Chapter 2, we present two sequential allocation frameworks for selecting from a set of competing alternatives when the decision maker cares about more than just the simple expected rewards. The frameworks are built on general parametric reward distributions and assume the objective of selection, which we refer to as utility, can be expressed as a function of the governing reward distributional parameters. The first algorithm, which we call utility-based OCBA (UOCBA), uses the Delta-technique to find the asymptotic distribution of a utility estimator to establish the asymptotically optimal allocation by solving the corresponding constrained optimization problem. The second, which we refer to as utility-based value of information (UVoI) approach, is a variation of the Bayesian value of information (VoI) techniques for efficient learning of the utility. We establish the asymptotic optimality of both allocation policies and illustrate the performance of the two algorithms through numerical experiments. Chapter 3 considers the restless bandit problem where the rewards on the arms are stochastic processes with strong temporal correlations that can be characterized by the well-known stationary autoregressive-moving-average time series models. We argue that despite the statistical stationarity of the reward processes, a linear improvement in cumulative reward can be obtained by exploiting the temporal correlation, compared to policies that work under the independent reward assumption. We introduce the notion of temporal exploration-exploitation trade-off, where a policy has to balance between learning more recent information to track the evolution of all reward processes and utilizing currently available predictions to gain better immediate reward. We prove a regret lower bound characterized by the bandit problem complexity and correlation strength along the time index and propose policies that achieve a matching upper bound. Lastly, Chapter 4 proposes a fully sequential experimental design procedure for the stochastic kriging (SK) methodology of fitting unknown response surfaces from simulation experiments. The procedure first estimates the current SK model performance by jackknifing the existing data points. Then, an additional SK model is fitted on the jackknife error estimates to capture the landscape of the current SK model performance. Methodologies for balancing exploration and exploitation trade-off in Bayesian optimization are employed to select the next simulation point. Compared to existing experimental design procedures relying on the posterior uncertainty estimates from the fitted SK model for evaluating model performance, our method is robust to the SK model specifications. We design a dynamic allocation algorithm, which we call kriging-based dynamic stochastic kriging (KDSK), and illustrate its performance through two numerical experiments.