Solving POMDP by On﬐olicy Linear Approximate Learning Algorithm

He, Qiming; Shayman, Mark A.

Solving POMDP by On﬐olicy Linear Approximate Learning Algorithm

Files

TR_99-68.pdf (231.77 KB)

No. of downloads: 553

Date

1999

Authors

He, Qiming

Shayman, Mark A.

Advisor

Shayman, Mark A.

Abstract

This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) prob﫠lem. The proposed algorithm is devised to provide a policyשּׂaking frame﫠work for Network Management Systems (NMS) which is in essence an engineering application without an exact model.

The algorithm consists of two phases. Firstly, the model is estimated and policy is learned in a completely observable simulator. Secondly, the estimated model is brought into the partially observed real﬷orld where the learned policy is then fineהּuned.

The learning algorithm is based on the onאּolicy linear gradientﬤescent learning algorithm with eligibility traces. This implies that the Qזּalue on belief space is linearly approximated by the Qזּalue at vertex over the belief space where onשּׁine TD method will be applied.

The proposed algorithm is tested against the exact solutions to exten﫠sive small/middleדּize benchmark examples from POMDP literature and found near optimal in terms of averageﬤiscountedגּeward and stepהּo﫠goal. The proposed algorithm significantly reduces the convergence time and can easily be adapted to large stateאַumber problems.

URI (handle)

http://hdl.handle.net/1903/6033

Collections

Institute for Systems Research Technical Reports

Full item page