Digital Repository at the University of Maryland (DRUM)  >
Institute for Systems Research  >
Institute for Systems Research Technical Reports 

Please use this identifier to cite or link to this item: http://hdl.handle.net/1903/6264

Title: An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Authors: Chang, Hyeong Soo
Fu, Michael C.
Marcus, Steven I.
Department/Program: ISR
Type: Technical Report
Keywords: Next-Generation Product Realization Systems
Issue Date: 2002
Series/Report no.: ISR; TR 2002-19
Abstract: Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite horizon Markov decision process (MDP) with infinite state space but finite action space and bounded rewards. The algorithm adaptively chooses which action to sample as the sampling process proceeds, and it is proven that the estimate produced by the algorithm is asymptotically unbiased and the worst possible bias is bounded by a quantity that converges to zero at rate $Oleft ( rac{Hln N}{N} ight)$, where $H$ is the horizon length and $N$ is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is $O((|A|N)^H)$, independent of the state space size, where $|A|$ is the size of the action space. The algorithm can be used to create an approximate receding horizon control to solve infinite horizon MDPs.
URI: http://hdl.handle.net/1903/6264
Appears in Collections:Institute for Systems Research Technical Reports

Files in This Item:

File Description SizeFormatNo. of Downloads
TR_2002-19.pdf204.86 kBAdobe PDF308View/Open

All items in DRUM are protected by copyright, with all rights reserved.

 

DRUM is brought to you by the University of Maryland Libraries
University of Maryland, College Park, MD 20742-7011 (301)314-1328.
Please send us your comments