X-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks

被引:0
作者
Torossian, Leonard [1 ,2 ]
Garivier, Aurelien [3 ]
Picheny, Victor [4 ]
机构
[1] Univ Toulouse, INRA, Toulouse, France
[2] Inst Math Toulouse, Toulouse, France
[3] Univ Lyon, ENS Lyon, Lyon, France
[4] PROWLER Io, 72 Hills Rd, Cambridge, England
来源
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101 | 2019年 / 101卷
关键词
Optimistic optimization; Risk-averse solutions; Quantile optimization; CVaR optimization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose and analyze StoROO, an algorithm for risk optimization on stochastic blackbox functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine, biology or finance, we do not focus on the mean payoff but on generic functionals of the return distribution. We provide a generic regret analysis of StoROO and illustrate its applicability with two examples: the optimization of quantiles and CVaR. Inspired by the bandit literature and black-box mean optimizers, StoROO relies on the possibility to construct confidence intervals for the targeted functional based on randomsize samples. We detail their construction in the case of quantiles, providing tight bounds based on Kullback-Leibler divergence. We finally present numerical experiments that show a dramatic impact of tight bounds for the optimization of quantiles and CVaR.
引用
收藏
页码:268 / 283
页数:16
相关论文
共 26 条
  • [1] Coherent measures of risk
    Artzner, P
    Delbaen, F
    Eber, JM
    Heath, D
    [J]. MATHEMATICAL FINANCE, 1999, 9 (03) : 203 - 228
  • [2] Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    Audibert, Jean-Yves
    Munos, Remi
    Szepesvari, Csaba
    [J]. THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) : 1876 - 1902
  • [3] Risk management with expectiles
    Bellini, Fabio
    Di Bernardino, Elena
    [J]. EUROPEAN JOURNAL OF FINANCE, 2017, 23 (06) : 487 - 506
  • [4] An old-new concept of convex risk measures: The optimized certainty equivalent
    Ben-Tal, Aharon
    Teboulle, Marc
    [J]. MATHEMATICAL FINANCE, 2007, 17 (03) : 449 - 476
  • [5] Bouttier Clement, 2017, Optimisation globale sous incertitudes: algorithmes stochastiques et bandits continus avec application a la planification de trajectoires d'avions
  • [6] Large deviations bounds for estimating conditional value-at-risk
    Brown, David B.
    [J]. OPERATIONS RESEARCH LETTERS, 2007, 35 (06) : 722 - 730
  • [7] Bubeck S, 2011, J MACH LEARN RES, V12, P1655
  • [8] David Y., 2016, JOINT EUR C MACH LEA, P556, DOI DOI 10.1007/978-3-319-46128-1_35
  • [9] Galichet N., 2013, AS C MACH LEARN, P245
  • [10] Garivier A., 2011, P 24 ANN C LEARN THE, P359