Online Non-Convex Optimization with Imperfect Feedback

被引:0
作者
Heliou, Amelie [1 ]
Martin, Matthieu [1 ]
Mertikopoulos, Panayotis [1 ,2 ]
Rahier, Thibaud [1 ]
机构
[1] Criteo AI Lab, Paris, France
[2] Univ Grenoble Alpes, CNRS, INRIA, LIG, Grenoble, France
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷
关键词
ALGORITHM; REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of online learning with non-convex losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function encountered at each stage, and we propose a mixed-strategy learning policy based on dual averaging. In this general context, we derive a series of tight regret minimization guarantees, both for the learner's static (external) regret, as well as the regret incurred against the best dynamic policy in hindsight. Subsequently, we apply this general template to the case where the learner only has access to the actual loss incurred at each stage of the process. This is achieved by means of a kernel-based estimator which generates an inexact model for each round's loss function using only the learner's realized losses as input.
引用
收藏
页数:12
相关论文
共 57 条
[31]  
Hazan E, 2012, OPTIMIZATION FOR MACHINE LEARNING, P287
[32]  
Hazan Elad, 2017, ICML 17
[33]  
Hazan Elad, 2009, ICML 09
[34]   Tracking the best expert [J].
Herbster, M ;
Warmuth, MK .
MACHINE LEARNING, 1998, 32 (02) :151-178
[35]  
Jadbabaie Ali, 2015, AISTATS 15
[36]   Efficient algorithms for online decision problems [J].
Kalai, A ;
Vempala, S .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2005, 71 (03) :291-307
[37]  
Kleinberg R., 2004, Proc. Adv. Neural Inf. Process. Syst., V17, P697
[38]  
Kleinberg Robert David, 2008, STOC 08
[39]  
Krichene Walid, 2015, ICML 15
[40]   THE WEIGHTED MAJORITY ALGORITHM [J].
LITTLESTONE, N ;
WARMUTH, MK .
INFORMATION AND COMPUTATION, 1994, 108 (02) :212-261