An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Chang, Hyeong Soo; Fu, Michael C.; Marcus, Steven I.

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

dc.contributor.author	Chang, Hyeong Soo	en_US
dc.contributor.author	Fu, Michael C.	en_US
dc.contributor.author	Marcus, Steven I.	en_US
dc.contributor.department	ISR	en_US
dc.date.accessioned	2007-05-23T10:11:56Z
dc.date.available	2007-05-23T10:11:56Z
dc.date.issued	2002	en_US
dc.description.abstract	Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((\|A\|N)^H)$, independent of the state space size, where $\|A\|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.	en_US
dc.format.extent	209781 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/6264
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	ISR; TR 2002-19	en_US
dc.subject	Next-Generation Product Realization Systems	en_US
dc.title	An Adaptive Sampling Algorithm for Solving Markov Decision Processes	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TR_2002-19.pdf
Size:: 204.86 KB
Format:: Adobe Portable Document Format

Download

Collections

Institute for Systems Research Technical Reports