Stochastic Systems with Cumulative Prospect Theory

Thumbnail Image


Publication or External Link







Stochastic control problems arise in many fields. Traditionally, the most widely used class of performance criteria in stochastic control problems is risk-neutral. More recent attempts at introducing risk-sensitivity into stochastic control problems include the application of utility functions. The decision theory community has long debated the merits of using expected utility for modeling human behaviors, as exemplified by the Allais paradox. Substantiated by strong experimental evidence, Cumulative Prospect Theory (CPT) based performance measures have been proposed as alternatives to expected utility based performance measures for evaluating human-centric systems. Our goal is to study stochastic control problems using performance measures derived from the cumulative prospect theory.

The first part of this thesis solves the problem of evaluating Markov decision processes (MDPs) using CPT-based performance measures. A well-known method of solving MDPs is dynamic programming, which has traditionally been applied with an expected utility criterion. When the performance measure is CPT-inspired, several complications arise. Firstly, when solving a problem via dynamic programming, it is important that the performance criterion has a recursive structure, which is not true for all CPT-based criteria. Secondly, we need to prove the traditional optimality criteria for the updated problems (i.e., MDPs with CPT-based performance criteria). The theorems stated in this part of the thesis answer the question: what are the conditions required on a CPT-inspired criterion such that the corresponding MDP is solvable via dynamic programming?

The second part of this thesis deals with stochastic global optimization problems. Using ideas from the cumulative prospect theory, we are able to introduce a novel model-based randomized optimization algorithm: Cumulative Weighting Optimization (CWO). The key contributions of our research are: 1) proving the convergence of the algorithm to an optimal solution given a mild assumption on the initial condition; 2) showing that the well-known cross-entropy optimization algorithm is a special case of CWO-based algorithms. To the best knowledge of the author, there is no previous convergence proof for the cross-entropy method. In practice, numerical experiments have demonstrated that a CWO-based algorithm can find a better solution than the cross-entropy method.

Finally, in the future, we would like to apply some of the ideas from cumulative prospect theory to games. In this thesis, we present a numerical example where cumulative prospect theory has an unexpected effect on the equilibrium points of the classic prisoner's dilemma game.