Tightening mutual information-based bounds on generalization error

被引:73
作者
Bu Y. [1 ,2 ,3 ]
Zou S. [4 ]
Veeravalli V.V. [1 ,2 ]
机构
[1] The ECE Department, University of Illinois at Urbana–Champaign, Urbana, 61801, IL
[2] The Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, 61801, IL
[3] The Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, 02139, MA
[4] The Department of Electrical Engineering, University at Buffalo, The State University of New York, Buffalo, 14228, NY
来源
IEEE Journal on Selected Areas in Information Theory | 2020年 / 1卷 / 01期
关键词
Cumulant generating function; Generalization error; Information-theoretic bounds; Stochastic gradient Langevin dynamics;
D O I
10.1109/JSAIT.2020.2991139
中图分类号
学科分类号
摘要
An information-theoretic upper bound on the generalization error of supervised learning algorithms is derived. The bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm. The bound is derived under more general conditions on the loss function than in existing studies; nevertheless, it provides a tighter characterization of the generalization error. Examples of learning algorithms are provided to demonstrate the tightness of the bound, and to show that it has a broad range of applicability. Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results. Finally, it is demonstrated that, unlike existing bounds, which are difficult to compute and evaluate empirically, the proposed bound can be estimated easily in practice. © 2020 IEEE.
引用
收藏
页码:121 / 130
页数:9
相关论文
共 37 条
  • [1] Bu Y., Zou S., Veeravalli V.V., Tightening mutual information based bounds on generalization error, Proc. IEEE Int. Symp. Inf. Theory (ISIT), pp. 587-591, (2019)
  • [2] Goodfellow I., Bengio Y., Courville A., Bengio Y., Deep Learning, (2016)
  • [3] Krizhevsky A., Sutskever I., Hinton G.E., ImageNet classification with deep convolutional neural networks, Proc. Adv. Neural Inf. Process. Syst. (NIPS), pp. 1097-1105, (2012)
  • [4] Young T., Hazarika D., Poria S., Cambria E., Recent trends in deep learning based natural language processing, IEEE Comput. Intell. Mag., 13, 3, pp. 55-75, (2018)
  • [5] Huval B., Et al., An Empirical Evaluation of Deep Learning on Highway Driving, (2015)
  • [6] Miotto R., Wang F., Wang S., Jiang X., Dudley J.T., Deep learning for healthcare: Review, opportunities and challenges, Briefings Bioinformat, 19, 6, pp. 1236-1246, (2018)
  • [7] Boucheron S., Bousquet O., Lugosi G., Theory of classification: A survey of some recent advances, ESAIM Probab. Stat., 9, pp. 323-375, (2005)
  • [8] Shalev-Shwartz S., Ben-David S., Understanding Machine Learning: From Theory to Algorithms, (2014)
  • [9] Anthony M., Bartlett P.L., Neural Network Learning: Theoretical Foundations, (2009)
  • [10] Neyshabur B., Tomioka R., Srebro N., In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning, (2014)