An Adaptive Sampling Algorithm for Solving Markov Decision Processes

dc.contributor.authorChang, Hyeong Sooen_US
dc.contributor.authorFu, Michael C.en_US
dc.contributor.authorMarcus, Steven I.en_US
dc.contributor.departmentISRen_US
dc.date.accessioned2007-05-23T10:11:56Z
dc.date.available2007-05-23T10:11:56Z
dc.date.issued2002en_US
dc.description.abstractBased on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.en_US
dc.format.extent209781 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/6264
dc.language.isoen_USen_US
dc.relation.ispartofseriesISR; TR 2002-19en_US
dc.subjectNext-Generation Product Realization Systemsen_US
dc.titleAn Adaptive Sampling Algorithm for Solving Markov Decision Processesen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR_2002-19.pdf
Size:
204.86 KB
Format:
Adobe Portable Document Format