Optimal Risk Sensitive Control of Semi-Markov Decision Processes

Chawla, Jay P.

Optimal Risk Sensitive Control of Semi-Markov Decision Processes

Files

PhD_2000-8.pdf (1.08 MB)

No. of downloads: 1296

Date

2000

Authors

Chawla, Jay P.

Advisor

Marcus, Steven I.
Shayman, Mark A.

Abstract

In this thesis, we study risk-sensitive cost minimization in semi-Markov decision processes. The main thrust of the thesis concerns the minimization of average risk sensitive costs over the infinite horizon.

Existing theory is expanded in two directions: the semi-Markov case is considered, and non-irreducible chains are considered. In particular, the analysis of the non-irreducible case is a significant addition to the literature, since many real-world systems do not exhibit irreducibility under all stationary Markov policies. Extension of existing results to the semi-Markov case is significant because it requires the definition of a new dynamic programming equation and a technically challenging adaptation of the Perron-Frobeniuseigen value from the discrete time case.

In order to determine an optimal policy, new concepts in the classification of Markov chains need to be introduced. This is because in the non-irreducible case, the average risk sensitive cost objective function permits extremely unlikely events to exert a controlling influence on costs. We define equivalence classes of states called 'strongly communicating classes' and formulate in terms of them a new characterization of the underlying structure of Markov Decision Problems and Markov chains.

In the risk sensitive case, the expected cost incurred prior to a stopping time with finite expected value can be infinite. For this reason, we introduce an assumption: reachability with finite cost. This is the fundamental assumption required to achieve the major results of this thesis.

We explore existence conditions for an optimal policy, optimality equations, and behavior for large and small risk sensitivity parameter. (Only non-negative risk parameters are discussed in this thesis -- i.e. the risk averse and risk neutral cases, not the risk seeking case.) Ramifications for the risk neutral objective function are also analyzed. Furthermore, a simple solution technique we call 'recursive computation' to find an optimal policy that is applicable to small state spaces is described through examples.

The countable state space case is explored, and results that hold only for a finite state space are also presented. Other, related objective functions such as sample path cost are analyzed and discussed.

We also explore finite time horizon semi-Markov problems, and present a general technique for solving them. We define a new objective function, the minimization of which is called the 'deadline problem'. This is a problem in which the probability of reaching the goal state in a set period of time is maximized. We transform the deadline problem objective function into an equivalent finite-horizon risk sensitive objective function.

URI (handle)

http://hdl.handle.net/1903/6150

Collections

Institute for Systems Research Technical Reports

Full item page