Optimal Learning with Non-Gaussian Rewards

Thumbnail Image
Files
Publication or External Link
Date
2014
Authors
Ding, Zi
Advisor
Ryzhov, Ilya O.
Citation
Abstract
In this disseration, the author studies sequential Bayesian learning problems modeled under non-Gaussian distributions. We focus on a class of problems called the multi-armed bandit problem, and studies its optimal learning strategy, the Gittins index policy. The Gittins index is computationally intractable and approxi- mation methods have been developed for Gaussian reward problems. We construct a novel theoretical and computational framework for the Gittins index under non- Gaussian rewards. By interpolating the rewards using continuous-time conditional Levy processes, we recast the optimal stopping problems that characterize Gittins indices into free-boundary partial integro-differential equations (PIDEs). We also provide additional structural properties and numerical illustrations on how our ap- proach can be used to approximate the Gittins index.
Notes
Rights