Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

被引:1
|
作者
Lin, Yifan [1 ]
Wang, Yuhao [1 ]
Zhou, Enlu [1 ]
机构
[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA
来源
JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING | 2022年
基金
美国国家科学基金会;
关键词
Multi-armed bandit; context; risk-averse; Thompson sampling;
D O I
10.10117/s11518-022-5541-9
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of O((1 + rho + 1/rho)d ln T ln K/delta root dKT(1+2 epsilon) ln K/delta 1/epsilon) that holds with probability 1 - delta under the mean-variance criterion with risk tolerance rho, for any 0 < epsilon < 1/2, 0 < delta < 1. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Yifan Lin
    Yuhao Wang
    Enlu Zhou
    Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
  • [2] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
  • [3] Risk-Averse Biased Human Policies with a Robot Assistant in Multi-Armed Bandit Settings
    Koller, Michael
    Patten, Timothy
    Vincze, Markus
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 483 - 488
  • [4] Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure
    Vakili, Sattar
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) : 1093 - 1111
  • [5] Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation
    Chen, Xinyi
    Hu, Qinran
    Shi, Qingxin
    Quan, Xiangjun
    Wu, Zaijun
    Li, Fangxing
    JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2020, 8 (06) : 1160 - 1167
  • [6] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
    Perchet, Vianney
    Rigollet, Philippe
    ANNALS OF STATISTICS, 2013, 41 (02) : 693 - 721
  • [7] Contextual Multi-Armed Bandit for Email Layout Recommendation
    Chen, Yan
    Vankov, Emilian
    Baltrunas, Linas
    Donovan, Preston
    Mehta, Akash
    Schroeder, Benjamin
    Herman, Matthew
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 400 - 402
  • [8] Robust control of the multi-armed bandit problem
    Caro, Felipe
    Das Gupta, Aparupa
    ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
  • [9] An Adaptive Algorithm in Multi-Armed Bandit Problem
    Zhang X.
    Zhou Q.
    Liang B.
    Xu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
  • [10] Robust control of the multi-armed bandit problem
    Felipe Caro
    Aparupa Das Gupta
    Annals of Operations Research, 2022, 317 : 461 - 480