Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

被引:1
|
作者
Shen, Yi [1 ]
Dunn, Jessilyn [2 ]
Zavlanos, Michael M. [1 ]
机构
[1] Duke Univ, Dept Mech Engn & Mat Sci, Durham, NC 27708 USA
[2] Duke Univ, Dept Biomed Engn, Durham, NC 27708 USA
关键词
BOUNDS;
D O I
10.1109/CDC51059.2022.9992917
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data. To achieve this, we first formulate a mixed-integer linear program that uses the expert data to obtain causal bounds on the Conditional Value at Risk (CVaR) of the true return for all possible UCs. We then transfer these causal bounds to the learner by formulating a causal bound constrained Upper Confidence Bound (UCB) algorithm to reduce the variance of online exploration and, as a result, identify the true minimum-risk arm faster, with fewer new samples. We provide a regret analysis of our proposed method and show that it can achieve zero or constant regret. Finally, we use an emotion regulation in mobile health example to show that our proposed method outperforms risk-averse MAB methods without causal bounds.
引用
收藏
页码:144 / 149
页数:6
相关论文
共 15 条
  • [1] Robust Risk-Averse Stochastic Multi-armed Bandits
    Maillard, Odalric-Ambrym
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 218 - 233
  • [2] Risk-averse Ambulance Redeployment via Multi-armed Bandits
    Sahin, Umitcan
    Yucesoy, Veysel
    Koc, Aykut
    Tekin, Cem
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [3] Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation
    Ameko, Mawulolo K.
    Beltzer, Miranda L.
    Cai, Lihua
    Boukhechba, Mehdi
    Teachman, Bethany A.
    Barnes, Laura E.
    RECSYS 2020: 14TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2020, : 249 - 258
  • [4] Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits
    Kagrecha, Anmol
    Nair, Jayakrishnan
    Jagannathan, Krishna
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5248 - 5267
  • [5] A revised approach for risk-averse multi-armed bandits under CVaR criterion
    Khajonchotpanya, Najakorn
    Xue, Yilin
    Rujeerapaiboon, Napat
    OPERATIONS RESEARCH LETTERS, 2021, 49 (04) : 465 - 472
  • [6] A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits
    Alami, Reda
    Mahfoud, Mohammed
    Achab, Mastane
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 272 - 280
  • [7] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022,
  • [8] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
  • [9] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Yifan Lin
    Yuhao Wang
    Enlu Zhou
    Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
  • [10] Residential HVAC Aggregation Based on Risk-averse Multi-armed Bandit Learning for Secondary Frequency Regulation
    Chen, Xinyi
    Hu, Qinran
    Shi, Qingxin
    Quan, Xiangjun
    Wu, Zaijun
    Li, Fangxing
    JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2020, 8 (06) : 1160 - 1167