Solving POMDP by On﬐olicy Linear Approximate Learning Algorithm

Loading...
Thumbnail Image
Files
TR_99-68.pdf(231.77 KB)
No. of downloads: 531
Publication or External Link
Date
1999
Authors
He, Qiming
Shayman, Mark A.
Advisor
Shayman, Mark A.
Citation
DRUM DOI
Abstract
This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) prob﫠lem. The proposed algorithm is devised to provide a policyשּׂaking frame﫠work for Network Management Systems (NMS) which is in essence an engineering application without an exact model.<p>The algorithm consists of two phases. Firstly, the model is estimated and policy is learned in a completely observable simulator. Secondly, the estimated model is brought into the partially observed real﬷orld where the learned policy is then fineהּuned.<p>The learning algorithm is based on the onאּolicy linear gradientﬤescent learning algorithm with eligibility traces. This implies that the Qזּalue on belief space is linearly approximated by the Qזּalue at vertex over the belief space where onשּׁine TD method will be applied.<p>The proposed algorithm is tested against the exact solutions to exten﫠sive small/middleדּize benchmark examples from POMDP literature and found near optimal in terms of averageﬤiscountedגּeward and stepהּo﫠goal. The proposed algorithm significantly reduces the convergence time and can easily be adapted to large stateאַumber problems.
Notes
Rights