Risk-sensitive and risk-neutral multiarmed bandits

被引：21

作者：

Denardo, Eric V.

Park, Haechurl

Rothblum, Uriel G.

机构：

[1] Yale Univ, Ctr Syst Sci, New Haven, CT 06520 USA

[2] Chung Ang Univ, Dept Business Adm, Seoul 156756, South Korea

[3] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 2007年 / 32卷 / 02期

关键词：

multiarmed bandits; exponential utility; risk-sensitive Markov decision processes; optimal stopping;

D O I：

10.1287/moor.1060.0240

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

For the multiarmed bandit, the classic result is probabilistic: each state of each bandit (Markov chain with rewards) has an index that is determined by an optimal stopping time for that state's bandit, and expected discounted income is maximized by playing at each epoch a bandit whose current state has the largest index. Our approach is analytic, not probabilistic. It uses pairwise comparison in place of stopping times. A simple recursion assigns to each state of each bandit a utility and an amplification of future utility that depend solely on the data for that state's bandit. These utilities and amplifications determine whether or not one state dominates another. We show that it is optimal to play at each epoch any bandit whose current state is not dominated by the current states of the other bandits. We obtain this result by a coherent analysis that encompasses three models-one with risk-averse exponential utility, one with risk-seeking exponential utility, and one with linear utility and either stopping or discounting. We also show that the risk-seeking case and a model of Nash [Nash, P. 1980. A generalized bandit problem. J. Roy. Statist. Soc. B 42 165-169) are equivalent to each other.

引用

页码：374 / 394

页数：21

共 50 条

[31] Risk-sensitive control with HARA
Lim, AEB
Zhou, XY
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (04) : 563 - 578
[32] Risk-Sensitive Reinforcement Learning
Shen, Yun
Tobia, Michael J.
Sommer, Tobias
Obermayer, Klaus
NEURAL COMPUTATION, 2014, 26 (07) : 1298 - 1328
[33] The output decision of a risk-neutral producer under risk of liquidation
Mahul, O
AMERICAN JOURNAL OF AGRICULTURAL ECONOMICS, 2000, 82 (01) : 49 - 58
[34] Risk-Sensitive Investment Management
Danilova, Albina
QUANTITATIVE FINANCE, 2015, 15 (12) : 1913 - 1914
[35] Accurate Updating for the Risk-Sensitive
Campbell-Moore, Catrin
Salow, Bernhard
BRITISH JOURNAL FOR THE PHILOSOPHY OF SCIENCE, 2022, 73 (03): : 751 - 776
[36] On the comparison of risk-neutral and risk-averse newsvendor problems
Katariya, Abhilasha Prakash
Cetinkaya, Sila
Tekin, Eylem
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2014, 65 (07) : 1090 - 1107
[37] Supply network design: Risk-averse or risk-neutral?
Madadi, AliReza
Kurz, Mary E.
Taaffe, Kevin M.
Sharp, Julia L.
Mason, Scott J.
COMPUTERS & INDUSTRIAL ENGINEERING, 2014, 78 : 55 - 65
[38] Indefinite risk-sensitive control
Gashi, Bujar
Zhang, Moyu
EUROPEAN JOURNAL OF CONTROL, 2023, 69
[39] Risk-sensitive production planning
Brown Univ, Providence, United States
Proc IEEE Conf Decis Control, (2686-2691):
[40] Risk-sensitive dual control
Dey, S
Moore, JB
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 1997, 7 (12) : 1047 - 1055

← 1 2 3 4 5 →