Correlations of Cross-Entropy Loss in Machine Learning

被引:10
作者
Connor, Richard [1 ]
Dearle, Alan [1 ]
Claydon, Ben [1 ]
Vadicamo, Lucia [2 ]
机构
[1] Univ St Andrews, Sch Comp Sci, St Andrews KY16 9SS, Scotland
[2] Italian Natl Res Council CNR, Inst Informat Sci & Technol, I-56124 Pisa, Italy
关键词
softmax; cross-entropy; f-divergence; Kullback-Leibler divergence; Jensen-Shannon divergence; triangular divergence; INEQUALITIES;
D O I
10.3390/e26060491
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Cross-entropy loss is crucial in training many deep neural networks. In this context, we show a number of novel and strong correlations among various related divergence functions. In particular, we demonstrate that, in some circumstances, (a) cross-entropy is almost perfectly correlated with the little-known triangular divergence, and (b) cross-entropy is strongly correlated with the Euclidean distance over the logits from which the softmax is derived. The consequences of these observations are as follows. First, triangular divergence may be used as a cheaper alternative to cross-entropy. Second, logits can be used as features in a Euclidean space which is strongly synergistic with the classification process. This justifies the use of Euclidean distance over logits as a measure of similarity, in cases where the network is trained using softmax and cross-entropy. We establish these correlations via empirical observation, supported by a mathematical explanation encompassing a number of strongly related divergence functions.
引用
收藏
页数:16
相关论文
共 19 条
  • [1] Agarwala A, 2020, Arxiv, DOI arXiv:2010.07344
  • [2] Aggarwal CC, 2018, NEURAL NETWORKS DEEP
  • [3] Local Intrinsic Dimensionality, Entropy and Statistical Divergences
    Bailey, James
    Houle, Michael E.
    Ma, Xingjun
    [J]. ENTROPY, 2022, 24 (09)
  • [4] de Sa V.R, 1994, P INT C NEUR INF PRO, P112
  • [5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [6] Inequalities between entropy and index of coincidence derived from information diagrams
    Harremoës, P
    Topsoe, F
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (07) : 2944 - 2960
  • [7] Huiskes M.J., 2008, P 1 ACM INT C MULT I, P39, DOI DOI 10.1145/1460096.1460104
  • [8] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [9] NONMETRIC MULTIDIMENSIONAL-SCALING - A NUMERICAL-METHOD
    KRUSKAL, JB
    [J]. PSYCHOMETRIKA, 1964, 29 (02) : 115 - 129
  • [10] Leeuw J.D., 2014, Wiley StatsRef: Statistics Reference Online, P1, DOI [DOI 10.1002/9781118445112.STAT06268.PUB2, 10.1002/9781118445112.stat06268.pub2 10.1002/9781118445112.stat06268.pub2]