A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses

被引:68
作者
Boudiaf, Malik [1 ]
Rony, Jerome [1 ]
Ziko, Imtiaz Masud [1 ]
Granger, Eric [1 ]
Pedersoli, Marco [1 ]
Piantanida, Pablo [2 ]
Ben Ayed, Ismail [1 ]
机构
[1] ETS Montreal, LIVIA, Montreal, PQ, Canada
[2] Univ Paris Saclay, L2S, Cent Supelec, CNRS, Paris, France
来源
COMPUTER VISION - ECCV 2020, PT VI | 2020年 / 12351卷
关键词
Metric learning; Deep learning; Information theory; KERNEL;
D O I
10.1007/978-3-030-58539-6_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information - as pairwise losses do - without the need for convoluted sample-mining heuristics. Our experiments (Code available at: https://github.com/jeromerony/dml cross entropy) over four standard DML benchmarks strongly support our findings. We obtain state-of-the-art results, outperforming recent and complex DML methods.
引用
收藏
页码:548 / 564
页数:17
相关论文
共 47 条
[1]  
[Anonymous], 2007, P 24 INT C MACH LEAR, DOI 10.1145/1273496.1273523
[2]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[3]   Deep Metric Learning with Hierarchical Triplet Loss [J].
Ge, Weifeng ;
Huang, Weilin ;
Dong, Dengke ;
Scott, Matthew R. .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :272-288
[4]  
Goldberger J., 2004, Adv. Neural Inf. Process. Syst., V17
[5]  
Hadsell R., 2006, P IEEE CVF C COMP VI
[6]   Identity Mappings in Deep Residual Networks [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :630-645
[7]  
Hermans A, 2017, Arxiv, DOI arXiv:1703.07737
[8]  
Jia XY, 2018, Arxiv, DOI [arXiv:1807.11205, DOI 10.48550/ARXIV.1807.11205]
[9]  
Kedem D., 2012, Advances in Neural Information Processing Systems (NeurIPS)
[10]   Attention-Based Ensemble for Deep Metric Learning [J].
Kim, Wonsik ;
Goyal, Bhavya ;
Chawla, Kunal ;
Lee, Jungmin ;
Kwon, Keunjoo .
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :760-777