Optimal Learning with Non-Gaussian Rewards

Ding, Zi

Optimal Learning with Non-Gaussian Rewards

Files

Ding_umd_0117E_15503.pdf (5.52 MB)

No. of downloads: 925

Date

2014

Authors

Ding, Zi

Advisor

Ryzhov, Ilya O.

DRUM DOI

https://doi.org/10.13016/M2PS3X

Abstract

In this disseration, the author studies sequential Bayesian learning problems modeled under non-Gaussian distributions. We focus on a class of problems called the multi-armed bandit problem, and studies its optimal learning strategy, the Gittins index policy. The Gittins index is computationally intractable and approxi- mation methods have been developed for Gaussian reward problems. We construct a novel theoretical and computational framework for the Gittins index under non- Gaussian rewards. By interpolating the rewards using continuous-time conditional Levy processes, we recast the optimal stopping problems that characterize Gittins indices into free-boundary partial integro-differential equations (PIDEs). We also provide additional structural properties and numerical illustrations on how our ap- proach can be used to approximate the Gittins index.

URI (handle)

http://hdl.handle.net/1903/15774

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations
Mathematics Theses and Dissertations

Full item page