Tighter PAC-Bayes Bounds Through Coin-Betting

被引:0
作者
Jang, Kyoungseok [1 ]
Jun, Kwang-Sung [1 ]
Kuzborskij, Ilja [2 ]
Orabona, Francesco [3 ]
机构
[1] Univ Arizona, Tucson, AZ 85721 USA
[2] Google DeepMind, London, England
[3] Boston Univ, Boston, MA 02215 USA
来源
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195 | 2023年 / 195卷
基金
美国国家科学基金会;
关键词
Concentration inequalities; PAC-Bayes; confidence sequences; coin-betting;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of estimating the mean of a sequence of random elements f(theta, X-1), . . . , f(theta, X-n) where f is a fixed scalar function, S = (X-1, . . . , X-n) are independent random variables, and. is a possibly S-dependent parameter. An example of such a problem would be to estimate the generalization error of a neural network trained on n examples where f is a loss function. Classically, this problem is approached through concentration inequalities holding uniformly over compact parameter sets of functions f, for example as in Rademacher or VC type analysis. However, in many problems, such inequalities often yield numerically vacuous estimates. Recently, the PAC-Bayes framework has been proposed as a better alternative for this class of problems for its ability to often give numerically non-vacuous bounds. In this paper, we show that we can do even better: we show how to refine the proof strategy of the PAC-Bayes bounds and achieve even tighter guarantees. Our approach is based on the coin-betting framework that derives the numerically tightest known time-uniform concentration inequalities from the regret guarantees of online gambling algorithms. In particular, we derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes. We demonstrate its tightness showing that by relaxing it we obtain a number of previous results in a closed form including Bernoulli-KL and empirical Bernstein inequalities. Finally, we propose an efficient algorithm to numerically calculate confidence sequences from our bound, which often generates nonvacuous confidence bounds even with one sample, unlike the state-of-the-art PAC-Bayes bounds.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Alquier P., 2021, ARXIV
  • [2] Alquier P, 2016, J MACH LEARN RES, V17
  • [3] Ambroladze A., 2006, Advances in neural information processing systems (NeurIPS), V19
  • [4] [Anonymous], 2003, J. Inequalities in Pure and Applied Mathematics
  • [5] [Anonymous], 2018, 5 INT C LEARN REPR I, DOI DOI 10.1002/PROT.25414
  • [6] Audibert JY, 2007, LECT NOTES ARTIF INT, V4754, P150
  • [7] Awasthi P., 2020, Advances in Neural Information Processing Systems, V33, P4403
  • [8] Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
  • [9] Boucheron Stephane, 2013, Concentration inequalities. A nonasymptotic theory of independence, DOI DOI 10.1093/ACPROF:OSO/9780199535255.001.0001
  • [10] Catoni O, 2007, IMS Lecture Notes-Monograph Series