Generalized Expansion Dimension

被引:37
作者
Houle, Michael E. [1 ]
Kashima, Hisashi
Nett, Michael [1 ]
机构
[1] Natl Inst Informat, Tokyo 1018430, Japan
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012) | 2012年
关键词
D O I
10.1109/ICDMW.2012.94
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we propose a framework for modeling the intrinsic dimensionality of data sets. The models can be viewed as generalizations of the expansion dimension, which was originally proposed for the analysis of certain similarity search indices using the Euclidean distance metric. Here, we extend the original model to other metric spaces: vector spaces with the L-p or vector angle (cosine similarity) distance measures, as well as product spaces for categorical data. We also provide a practical guide for estimating both local and global intrinsic dimensionality. The estimates of data complexity can subsequently be used in the design and analysis of algorithms for data mining applications such as search, clustering, classification, and outlier detection.
引用
收藏
页码:587 / 594
页数:8
相关论文
共 25 条
[1]  
Belussi A., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P299
[2]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[3]  
Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857
[4]  
Blum M., 1973, Journal of Computer and System Sciences, V7, P448, DOI 10.1016/S0022-0000(73)80033-9
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]   TAIL OF THE HYPERGEOMETRIC DISTRIBUTION [J].
CHVATAL, V .
DISCRETE MATHEMATICS, 1979, 25 (03) :285-287
[7]   Nearest neighbor queries in metric spaces [J].
Clarkson, KL .
DISCRETE & COMPUTATIONAL GEOMETRY, 1999, 22 (01) :63-93
[8]  
de Vries T., 2010, Proceedings 2010 10th IEEE International Conference on Data Mining (ICDM 2010), P128, DOI 10.1109/ICDM.2010.151
[9]   Density-preserving projections for large-scale local anomaly detection [J].
de Vries, Timothy ;
Chawla, Sanjay ;
Houle, Michael E. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) :25-52
[10]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226