Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

被引：1

作者：

Lin, Yifan ^{[1
]}

Wang, Yuhao ^{[1
]}

Zhou, Enlu ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA

来源：

JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING | 2022年

基金：

美国国家科学基金会;

关键词：

Multi-armed bandit; context; risk-averse; Thompson sampling;

D O I：

10.10117/s11518-022-5541-9

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of O((1 + rho + 1/rho)d ln T ln K/delta root dKT(1+2 epsilon) ln K/delta 1/epsilon) that holds with probability 1 - delta under the mean-variance criterion with risk tolerance rho, for any 0 < epsilon < 1/2, 0 < delta < 1. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

引用

页数：22

共 50 条

[1] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
Yifan Lin
Yuhao Wang
Enlu Zhou
Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
[2] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
Lin, Yifan
Wang, Yuhao
Zhou, Enlu
JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
[3] Risk-Averse Biased Human Policies with a Robot Assistant in Multi-Armed Bandit Settings
Koller, Michael
Patten, Timothy
Vincze, Markus
THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 483 - 488
[4] Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure
Vakili, Sattar
Zhao, Qing
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) : 1093 - 1111
[5] Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation
Chen, Xinyi
Hu, Qinran
Shi, Qingxin
Quan, Xiangjun
Wu, Zaijun
Li, Fangxing
JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2020, 8 (06) : 1160 - 1167
[6] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
Perchet, Vianney
Rigollet, Philippe
ANNALS OF STATISTICS, 2013, 41 (02) : 693 - 721
[7] Contextual Multi-Armed Bandit for Email Layout Recommendation
Chen, Yan
Vankov, Emilian
Baltrunas, Linas
Donovan, Preston
Mehta, Akash
Schroeder, Benjamin
Herman, Matthew
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 400 - 402
[8] Robust control of the multi-armed bandit problem
Caro, Felipe
Das Gupta, Aparupa
ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
[9] An Adaptive Algorithm in Multi-Armed Bandit Problem
Zhang X.
Zhou Q.
Liang B.
Xu J.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
[10] Robust control of the multi-armed bandit problem
Felipe Caro
Aparupa Das Gupta
Annals of Operations Research, 2022, 317 : 461 - 480

← 1 2 3 4 5 →