Improving generalization of deep neural networks by leveraging margin distribution

被引:9
作者
Lyu, Shen-Huan [1 ]
Wang, Lu [1 ]
Zhou, Zhi-Hua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
关键词
Deep neural network; Margin theory; Generalization;
D O I
10.1016/j.neunet.2022.03.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research has used margin theory to analyze the generalization performance for deep neural networks (DNNs). The existed results are almost based on the spectrally-normalized minimum margin. However, optimizing the minimum margin ignores a mass of information about the entire margin distribution, which is crucial to generalization performance. In this paper, we prove a generalization upper bound dominated by the statistics of the entire margin distribution. Compared with the minimum margin bounds, our bound highlights an important measure for controlling the complexity, which is the ratio of the margin standard deviation to the expected margin. We utilize a convex margin distribution loss function on the deep neural networks to validate our theoretical results by optimizing the margin ratio. Experiments and visualizations confirm the effectiveness of our approach and the correlation between generalization gap and margin ratio. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:48 / 60
页数:13
相关论文
共 68 条
  • [41] Neyshabur B., 2018, 6 INT C LEARN REPR
  • [42] Neyshabur B, 2017, ADV NEUR IN, V30
  • [43] Pan SJ, 2009, 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, P1187
  • [44] Practical Black-Box Attacks against Machine Learning
    Papernot, Nicolas
    McDaniel, Patrick
    Goodfellow, Ian
    Jha, Somesh
    Celik, Z. Berkay
    Swami, Ananthram
    [J]. PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, : 506 - 519
  • [45] Paszke A, 2019, ADV NEUR IN, V32
  • [46] Reyzin L, 2006, P 23 INT C MACH LEAR, P753, DOI DOI 10.1145/1143844.1143939
  • [47] Beyond Sharing Weights for Deep Domain Adaptation
    Rozantsev, Artem
    Salzmann, Mathieu
    Fua, Pascal
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (04) : 801 - 814
  • [48] Deep learning
    Rusk, Nicole
    [J]. NATURE METHODS, 2016, 13 (01) : 35 - 35
  • [49] ImageNet Large Scale Visual Recognition Challenge
    Russakovsky, Olga
    Deng, Jia
    Su, Hao
    Krause, Jonathan
    Satheesh, Sanjeev
    Ma, Sean
    Huang, Zhiheng
    Karpathy, Andrej
    Khosla, Aditya
    Bernstein, Michael
    Berg, Alexander C.
    Fei-Fei, Li
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) : 211 - 252
  • [50] Schapire R.E., 1997, INT C MACH LEARN ICM, P322