Fast generalization rates for distance metric learningImproved theoretical analysis for smooth strongly convex distance metric learning

被引:0
作者
Han-Jia Ye
De-Chuan Zhan
Yuan Jiang
机构
[1] Nanjing University,
来源
Machine Learning | 2019年 / 108卷
关键词
Distance metric learning; Generalization analysis; Excess risk;
D O I
暂无
中图分类号
学科分类号
摘要
Distance metric learning (DML) aims to find a suitable measure to compute a distance between instances. Facilitated by side information, the learned metric can often improve the performance of similarity or distance based methods such as kNN. Theoretical analyses of DML focus on the learning effectiveness for squared Mahalanobis distance. Specifically, whether the Mahalanobis metric learned from the empirically sampled pairwise constraints is in accordance with the optimal metric optimized over the paired samples generated from the true distribution, and the sample complexity of this process. The excess risk could measure the quality of the generalization, i.e., the gap between the expected objective of empirical metric learned from a regularized objective with convex loss function and the one with the optimal metric. Given N training examples, existing analyses over this non-i.i.d. learning problem have proved the excess risk of DML converges to zero at a rate of O1N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}\left( \frac{1}{\sqrt{N}}\right) $$\end{document}. In this paper, we obtain a faster convergence rate of DML, O1N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}\left( \frac{1}{N}\right) $$\end{document}, when learning the distance metric with a smooth loss function and a strongly convex objective. In addition, when the problem is relatively easy, and the number of training samples is large enough, this rate can be further improved to O1N2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}\left( \frac{1}{N^2}\right) $$\end{document}. Synthetic experiments validate that DML can achieve the specified faster generalization rate, and results under various settings help explore the theoretical properties of DML a lot.
引用
收藏
页码:267 / 295
页数:28
相关论文
共 41 条
[1]  
Agarwal S(2009)Generalization bounds for ranking algorithms via algorithmic stability Journal of Machine Learning Research 10 441-474
[2]  
Niyogi P(2005)Local rademacher complexities The Annals of Statistics 33 1497-1537
[3]  
Bartlett PL(2002)Rademacher and gaussian complexities: Risk bounds and structural results Journal of Machine Learning Research 3 463-482
[4]  
Bousquet O(2009)A fast iterative shrinkage-thresholding algorithm for linear inverse problems SIAM Journal on Imaging Sciences 2 183-202
[5]  
Mendelson S(2015)Robustness and generalization for metric learning Neurocomputing 151 259-267
[6]  
Bartlett PL(2002)Stability and generalization Journal of Machine Learning Research 2 499-526
[7]  
Mendelson S(2016)Generalization bounds for metric and similarity learning Machine Learning 102 115-132
[8]  
Beck A(2010)Large scale online learning of image similarity through ranking Journal of Machine Learning Research 11 1109-1135
[9]  
Teboulle M(2008)Ranking and empirical minimization of u-statistics The Annals of Statistics 36 844-874
[10]  
Bellet A(2014)Guaranteed classification via regularized similarity learning Neural Computation 26 497-522