Effectiveness of the Euclidean distance in high dimensional spaces

被引:42
作者
Xia, Shuyin [1 ]
Xiong, Zhongyang [1 ]
Luo, Yueguo [1 ,3 ]
Xu, Wei [2 ]
Zhang, Guanghua [1 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
[2] South Cent Univ Nationalities, Coll Comp Sci, Wuhan 400015, Peoples R China
[3] Yangtze Normal Univ, Network Informat Ctr, Chongqing 408100, Peoples R China
来源
OPTIK | 2015年 / 126卷 / 24期
关键词
Euclidean distance; High dimensional space; Independent and identical distributed;
D O I
10.1016/j.ijleo.2015.09.093
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
This paper presents analysis of applicability and performance of the Euclidean distance in relation to the dimensionality of the space. The effect of dimensionality on the behavior of Euclidean distance is explored; Furthermore, it is shown that the minimum distance approaches the maximum distance under a broader set of conditions without requiring the calculation of variance of random variables. It is demonstrated that the minimum distance approaches the maximum distance even for some low dimensional distributions, such as normal distribution. Many proposed measures not based directly on Euclidean distance cannot enlarger the difference between closest point and farthest point. The analysis has been performed on a wide range of artificial and publicly available datasets. As the variables of different distributions have different convergence rates, the results should not be interpreted to mean that the Euclidean distance is not applicable. In fact, it is shown in experiments that the Euclidean distance is very useful in the noncentral t-distribution even for the dimensionality higher than 10,000. Furthermore, it is observed that the behavior of Euclidean distance becomes more useful with increased number of samples. (C) 2015 Elsevier GmbH. All rights reserved.
引用
收藏
页码:5614 / 5619
页数:6
相关论文
共 18 条
[1]  
Abramowitz M., 1972, TECHNOMETRICS, V34, P78
[2]  
[Anonymous], 2001, P ICDT
[3]   Robust high performance reinforcement learning through weighted k-nearest neighbors [J].
Antonio Martin H, Jose ;
de Lope, Javier ;
Maravall, Dario .
NEUROCOMPUTING, 2011, 74 (08) :1251-1259
[4]   High-dimensional labeled data analysis with topology representing graphs [J].
Aupetit, M ;
Catz, T .
NEUROCOMPUTING, 2005, 63 :139-169
[5]   A novel attribute weighting algorithm for clustering high-dimensional categorical data [J].
Bai, Liang ;
Liang, Jiye ;
Dang, Chuangyin ;
Cao, Fuyuan .
PATTERN RECOGNITION, 2011, 44 (12) :2843-2861
[6]  
Beyer K, 1999, P ICDT
[7]  
Hans-Peter Kriegel, 2008, P 14 ACM SIGKDD INT
[8]  
HINNEBURG A, 2000, P VLDB
[9]   Analysis of High-Pitched Phonation Using Three-Dimensional Computed Tomography [J].
Hiramatsu, Hiroyuki ;
Tokashiki, Ryoji ;
Nakamura, Hirokazu ;
Motohashi, Ray ;
Sakurai, Eriko ;
Nomoto, Masaki ;
Toyomura, Fumimasa ;
Suzuki, Mamoru .
JOURNAL OF VOICE, 2012, 26 (05) :548-554
[10]   High-dimensional mean estimation via l1 penalized normal likelihood [J].
Katayama, Shota .
JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 130 :90-106