A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses

被引：68

作者：

Boudiaf, Malik ^{[1
]}

Rony, Jerome ^{[1
]}

Ziko, Imtiaz Masud ^{[1
]}

Granger, Eric ^{[1
]}

Pedersoli, Marco ^{[1
]}

Piantanida, Pablo ^{[2
]}

Ben Ayed, Ismail ^{[1
]}

机构：

[1] ETS Montreal, LIVIA, Montreal, PQ, Canada

[2] Univ Paris Saclay, L2S, Cent Supelec, CNRS, Paris, France

来源：

COMPUTER VISION - ECCV 2020, PT VI | 2020年 / 12351卷

关键词：

Metric learning; Deep learning; Information theory; KERNEL;

D O I：

10.1007/978-3-030-58539-6_33

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information - as pairwise losses do - without the need for convoluted sample-mining heuristics. Our experiments (Code available at: https://github.com/jeromerony/dml cross entropy) over four standard DML benchmarks strongly support our findings. We obtain state-of-the-art results, outperforming recent and complex DML methods.

引用

页码：548 / 564

页数：17

共 47 条

[1]

[Anonymous], 2007, P 24 INT C MACH LEAR, DOI 10.1145/1273496.1273523

[2]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[3] Deep Metric Learning with Hierarchical Triplet Loss [J].

Ge, Weifeng ;

Huang, Weilin ;

Dong, Dengke ;

Scott, Matthew R. .

COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :272-288

[4]

Goldberger J., 2004, Adv. Neural Inf. Process. Syst., V17

[5]

Hadsell R., 2006, P IEEE CVF C COMP VI

[6] Identity Mappings in Deep Residual Networks [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :630-645

[7]

Hermans A, 2017, Arxiv, DOI arXiv:1703.07737

[8]

Jia XY, 2018, Arxiv, DOI [arXiv:1807.11205, DOI 10.48550/ARXIV.1807.11205]

[9]

Kedem D., 2012, Advances in Neural Information Processing Systems (NeurIPS)

[10] Attention-Based Ensemble for Deep Metric Learning [J].

Kim, Wonsik ;

Goyal, Bhavya ;

Chawla, Kunal ;

Lee, Jungmin ;

Kwon, Keunjoo .

COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :760-777

← 1 2 3 4 5 →