Dimensionality, Discriminability, Density & Distance Distributions

被引:56
作者
Houle, Michael E. [1 ]
机构
[1] Res Org Informat & Syst, Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
来源
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW) | 2013年
关键词
Intrinsic dimensionality; distance distribution; discriminability; features; statistical copula;
D O I
10.1109/ICDMW.2013.139
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For many large-scale applications in data mining, machine learning, and multimedia, fundamental operations such as similarity search, retrieval, classification, clustering, and anomaly detection generally suffer from an effect known as the 'curse of dimensionality'. As the dimensionality of the data increases, distance values tend to become less discriminative, due to their increasing relative concentration about the mean of their distribution. For this reason, researchers have considered the analysis of structures and methods in terms of measures of the intrinsic dimensionality of the data sets. This paper is concerned with a generalization of a discrete measure of intrinsic dimensionality, the expansion dimension, to the case of continuous distance distributions. This notion of the intrinsic dimensionality of a distribution is shown to precisely coincide with a natural notion of the incliscriminability of distances and features. Furthermore, for any distance distribution with differentiable cumulative density function, a fundamental relationship is shown to exist between probability density, the cumulative density (cumulative probability divided by distance), intrinsic dimensionality, and discriminability.
引用
收藏
页码:468 / 473
页数:6
相关论文
共 26 条
[1]  
[Anonymous], 2001, Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
[2]  
[Anonymous], 2002, NIPS
[3]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[4]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[5]  
Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857
[6]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[7]  
Davis J.V., 2007, P 24 INT C MACH LEAR, P209, DOI DOI 10.1145/1273496.1273523
[8]  
de Vries T., 2010, Proceedings 2010 10th IEEE International Conference on Data Mining (ICDM 2010), P128, DOI 10.1109/ICDM.2010.151
[9]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
[10]   Bounded geometries, fractals, and low-distortion embeddings [J].
Gupta, A ;
Krauthgamer, R ;
Lee, JR .
44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, :534-543