Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

被引：0

作者：

Prashanth, L. A. ^{[1
]}

Jie, Cheng ^{[2
]}

Fu, Michael ^{[1
,3
]}

Marcus, Steve ^{[1
,4
]}

Szepesvari, Csaba ^{[5
]}

机构：

[1] Univ Maryland, Inst Syst Res, College Pk, MD 20742 USA

[2] Univ Maryland, Dept Math, College Pk, MD 20742 USA

[3] Univ Maryland, Robert H Smith Sch Business, College Pk, MD USA

[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD USA

[5] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

基金：

美国国家科学基金会; 加拿大自然科学与工程研究理事会;

关键词：

STOCHASTIC-APPROXIMATION; VARIANCE; OPTIMIZATION; ECONOMICS; RISK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also illustrate the usefulness of CPT-based criteria in a traffic signal control application.

引用

页数：10

共 40 条

[1] BEHAVIOR OF THE RATIONAL MAN BEFORE RISK - CRITICISM OF AMERICAN SCHOOL POSTULATES AND AXIOMS
Allais, M.
[J]. ECONOMETRICA, 1953, 21 (04) : 503 - 546
[2] [Anonymous], 1981, Practical optimization
[3] [Anonymous], 2007, DYNAMIC PROGRAMMING
[4] [Anonymous], 1970, Utility Theory for Decision Making
[5] [Anonymous], 2005, INTRO STOCHASTIC SEA
[6] [Anonymous], 2009, Stochastic Approximation: A Dynamical Systems Viewpoint
[7] Athreya Krishna B, 2006, Measure theory and probability theory
[8] Thirty Years of Prospect Theory in Economics: A Review and Assessment
Barberis, Nicholas C.
[J]. JOURNAL OF ECONOMIC PERSPECTIVES, 2013, 27 (01) : 173 - 195
[9] Bhatnagar S., 2013, Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods, V434
[10] Simultaneous Perturbation Newton Algorithms for Simulation Optimization
Bhatnagar, Shalabh
Prashanth, L. A.
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2015, 164 (02) : 621 - 643

← 1 2 3 4 →