Risk-Averse Allocation Indices for Multiarmed Bandit Problem

被引:4
作者
Malekipirbazari, Milad [1 ]
Cavus, Ozlem [1 ]
机构
[1] Bilkent Univ, Dept Ind Engn, TR-06800 Ankara, Turkey
关键词
Markov processes; Indexes; Resource management; Heuristic algorithms; Dynamic scheduling; Routing; Random variables; Coherent risk measures; dynamic allocation index; dynamic risk-aversion; Gittins index; multiarmed bandit (MAB);
D O I
10.1109/TAC.2021.3053539
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.
引用
收藏
页码:5522 / 5529
页数:8
相关论文
共 22 条
[1]   Coherent measures of risk [J].
Artzner, P ;
Delbaen, F ;
Eber, JM ;
Heath, D .
MATHEMATICAL FINANCE, 1999, 9 (03) :203-228
[2]   Computational Methods for Risk-Averse Undiscounted Transient Markov Models [J].
Cavus, Ozlem ;
Ruszczynski, Andrzej .
OPERATIONS RESEARCH, 2014, 62 (02) :401-417
[3]   Risk aversion, road choice, and the one-armed bandit problem [J].
Chancelier, Jean-Philippe ;
De Lara, Michel ;
de Palma, Andre .
TRANSPORTATION SCIENCE, 2007, 41 (01) :1-14
[4]   Risk aversion in expected intertemporal discounted utilities bandit problems [J].
Chancelier, Jean-Philippe ;
De Lara, Michel ;
de Palma, Andre .
THEORY AND DECISION, 2009, 67 (04) :433-440
[5]   Risk-sensitive and risk-neutral multiarmed bandits [J].
Denardo, Eric V. ;
Park, Haechurl ;
Rothblum, Uriel G. .
MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (02) :374-394
[6]   The multi-armed bandit, with constraints [J].
Denardo, Eric V. ;
Feinberg, Eugene A. ;
Rothblum, Uriel G. .
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) :37-62
[7]  
Galichet N., 2013, PMLR, P245
[8]  
GITTINS JC, 1979, J ROY STAT SOC B MET, V41, P148
[9]   A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements [J].
Glazebrook, K. D. ;
Minty, R. .
MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (01) :26-44
[10]   Risk-aware multi-armed bandit problem with application to portfolio selection [J].
Huo, Xiaoguang ;
Fu, Feng .
ROYAL SOCIETY OPEN SCIENCE, 2017, 4 (11)