Topics in Stochastic Optimization
Publication or External Link
In this thesis, we work with three topics in stochastic optimization: ranking and selection (R&S), multi-armed bandits (MAB) and stochastic kriging (SK). For R&S, we first consider the problem of making inferences about all candidates based on samples drawn from one. Then we study the problem of designing efficient allocation algorithms for problems where the selection objective is more complex than the simple expectation of a random output. In MAB, we use the autoregressive process to capture possible temporal correlations in the unknown reward processes and study the effect of such correlations on the regret bounds of various bandit algorithms. Lastly, for SK, we design a procedure for dynamic experimental design for establishing a good global fit by efficiently allocating simulation budgets in the design space.
The first two Chapters of the thesis work with variations of the R&S problem. In Chapter 1, we consider the problem of choosing the best design alternative under a small simulation budget, where making inferences about all alternatives from a single observation could enhance the probability of correct selection. We propose a new selection rule exploiting the relative similarity between pairs of alternatives and show its improvement on selection performance, evaluated by the Probability of Correct Selection, compared to selection based on collected sample averages. We illustrate the effectiveness by applying our selection index on simulated R&S problems using two well-known budget allocation policies. In Chapter 2, we present two sequential allocation frameworks for selecting from a set of competing alternatives when the decision maker cares about more than just the simple expected rewards. The frameworks are built on general parametric reward distributions and assume the objective of selection, which we refer to as utility, can be expressed as a function of the governing reward distributional parameters. The first algorithm, which we call utility-based OCBA (UOCBA), uses the Delta-technique to find the asymptotic distribution of a utility estimator to establish the asymptotically optimal allocation by solving the corresponding constrained optimization problem. The second, which we refer to as utility-based value of information (UVoI) approach, is a variation of the Bayesian value of information (VoI) techniques for efficient learning of the utility. We establish the asymptotic optimality of both allocation policies and illustrate the performance of the two algorithms through numerical experiments.
Chapter 3 considers the restless bandit problem where the rewards on the arms are stochastic processes with strong temporal correlations that can be characterized by the
well-known stationary autoregressive-moving-average time series models. We argue that despite the statistical stationarity of the reward processes, a linear improvement in cumulative
reward can be obtained by exploiting the temporal correlation, compared to
policies that work under the independent reward assumption. We introduce the
notion of temporal exploration-exploitation trade-off, where a policy has to balance
between learning more recent information to track the evolution of all reward processes and utilizing currently available predictions to gain better immediate reward.
We prove a regret lower bound characterized by the bandit problem complexity
and correlation strength along the time index and propose policies that achieve a
matching upper bound.
Lastly, Chapter 4 proposes a fully sequential experimental design procedure for the stochastic kriging (SK) methodology of fitting unknown response surfaces from simulation experiments. The procedure first estimates the current SK model performance by jackknifing the existing data points. Then, an additional SK model is fitted on the jackknife error estimates to capture the landscape of the current SK model performance. Methodologies for balancing exploration and exploitation trade-off in Bayesian optimization are employed to select the next simulation point. Compared to existing experimental design procedures relying on the posterior uncertainty estimates from the fitted SK model for evaluating model performance, our method is robust to the SK model specifications. We design a dynamic allocation algorithm, which we call kriging-based dynamic stochastic kriging (KDSK), and illustrate its performance through two numerical experiments.