Institute for Systems Research

Permanent URI for this communityhttp://hdl.handle.net/1903/4375

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Solving Continuous-State POMDPs via Density Projection
    (2007) Zhou, Enlu; Fu, Michael C.; Marcus, Steven I.
    Research on numerical solution methods for partially observable Markov decision processes (POMDPs) has primarily focused on discrete-state models, and these algorithms do not generally extend to continuous-state POMDPs, due to the infinite dimensionality of the belief space. In this paper, we develop a computationally viable and theoretically sound method for solving continuous-state POMDPs by effectively reducing the dimensionality of the belief space via density projections. The density projection technique is also incorporated into particle filtering to provide a filtering scheme for online decision making. We provide an error bound between the value function induced by the policy obtained by our method and the true value function of the POMDP, and also an error bound between the projection particle filtering and the optimal filtering. Finally, we illustrate the effectiveness of our method through an inventory control problem.
  • Thumbnail Image
    Item
    Solving POMDP by On﬐olicy Linear Approximate Learning Algorithm
    (1999) He, Qiming; Shayman, Mark A.; Shayman, Mark A.; ISR
    This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) prob﫠lem. The proposed algorithm is devised to provide a policyשּׂaking frame﫠work for Network Management Systems (NMS) which is in essence an engineering application without an exact model.

    The algorithm consists of two phases. Firstly, the model is estimated and policy is learned in a completely observable simulator. Secondly, the estimated model is brought into the partially observed real﬷orld where the learned policy is then fineהּuned.

    The learning algorithm is based on the onאּolicy linear gradientﬤescent learning algorithm with eligibility traces. This implies that the Qזּalue on belief space is linearly approximated by the Qזּalue at vertex over the belief space where onשּׁine TD method will be applied.

    The proposed algorithm is tested against the exact solutions to exten﫠sive small/middleדּize benchmark examples from POMDP literature and found near optimal in terms of averageﬤiscountedגּeward and stepהּo﫠goal. The proposed algorithm significantly reduces the convergence time and can easily be adapted to large stateאַumber problems.

  • Thumbnail Image
    Item
    Using POMDP as Modeling Framework for Network Fault Management
    (1999) He, Qiming; Shayman, Mark A.; Shayman, Mark A.; ISR
    For highדּpeed networks, it is important that fault management be proactive--i.e., detect, diagnose, and mitigate problems before they result in severe degradation of network performance. Proactive fault manageשּׂent depends on monitoring the network to obtain the data on which to base manager decisions. However, monitoring introduces additional overhead that may itself degrade network performance especially when the network is in a stressed state. Thus, a tradeoff must be made be﫠tween the amount of data collected and transferred on one hand, and the speed and accuracy of fault detection and diagnosis on the other hand. Such a tradeoff can be naturally formulated as a Partially Observable Markov decision process (POMDP).

    Since exact solution of POMDPs for a realistic number of states is computationally prohibitive, we develop a reinforcementשּׁearningﬢased fast algorithm which learns the decisionגּule in an approximate network simulator and makes it fast deployable to the real network. Simulation results are given to diagnose a switch fault in an ATM network. This approach can be applied to centralized fault management or to construct intelligent agents for distributed fault management.