Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

被引:1
|
作者
Lin, Yifan [1 ]
Wang, Yuhao [1 ]
Zhou, Enlu [1 ]
机构
[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Multi-armed bandit; context; risk-averse; Thompson sampling;
D O I
10.10117/s11518-022-5541-9
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of O((1 + rho + 1/rho)d ln T ln K/delta root dKT(1+2 epsilon) ln K/delta 1/epsilon) that holds with probability 1 - delta under the mean-variance criterion with risk tolerance rho, for any 0 < epsilon < 1/2, 0 < delta < 1. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Yifan Lin
    Yuhao Wang
    Enlu Zhou
    Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
  • [2] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
  • [3] Risk-Averse Biased Human Policies with a Robot Assistant in Multi-Armed Bandit Settings
    Koller, Michael
    Patten, Timothy
    Vincze, Markus
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 483 - 488
  • [4] Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure
    Vakili, Sattar
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) : 1093 - 1111
  • [5] Robust Risk-Averse Stochastic Multi-armed Bandits
    Maillard, Odalric-Ambrym
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 218 - 233
  • [6] Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation
    Chen, Xinyi
    Hu, Qinran
    Shi, Qingxin
    Quan, Xiangjun
    Wu, Zaijun
    Li, Fangxing
    JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2020, 8 (06) : 1160 - 1167
  • [7] Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation
    Xinyi Chen
    Qinran Hu
    Qingxin Shi
    Xiangjun Quan
    Zaijun Wu
    Fangxing Li
    JournalofModernPowerSystemsandCleanEnergy, 2020, 8 (06) : 1160 - 1167
  • [8] Risk-averse Ambulance Redeployment via Multi-armed Bandits
    Sahin, Umitcan
    Yucesoy, Veysel
    Koc, Aykut
    Tekin, Cem
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [9] Variational inference for the multi-armed contextual bandit
    Urteaga, Inigo
    Wiggins, Chris H.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [10] The budgeted multi-armed bandit problem
    Madani, O
    Lizotte, DJ
    Greiner, R
    LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645