Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional

被引:28
作者
Houle, Michael E. [1 ]
机构
[1] Natl Inst Informat, Chiyoda Ku, 2-1-2 Hitotsubashi, Tokyo 1018430, Japan
来源
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2017 | 2017年 / 10609卷
关键词
SIMILARITY SEARCH;
D O I
10.1007/978-3-319-68474-1_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distance-based expansion models of intrinsic dimensionality have had recent application in the analysis of complexity of similarity applications, and in the design of efficient heuristics. This theory paper extends one such model, the local intrinsic dimension (LID), to a multivariate form that can account for the contributions of different distributional components towards the intrinsic dimensionality of the entire feature set, or equivalently towards the discriminability of distance measures defined in terms of these feature combinations. Formulas are established for the effect on LID under summation, product, composition, and convolution operations on smooth functions in general, and cumulative distribution functions in particular. For some of these operations, the dimensional or discriminability characteristics of the result are also shown to depend on a form of distributional support. As an example, an analysis is provided that quantifies the impact of introduced random Gaussian noise on the intrinsic dimension of data. Finally, a theoretical relationship is established between the LID model and the classical correlation dimension.
引用
收藏
页码:80 / 95
页数:16
相关论文
共 27 条
[1]  
[Anonymous], 2015, KDD
[2]  
Beygelzimer A, 2006, P 23 INT C MACH LEAR, P97, DOI DOI 10.1145/1143844.1143857
[3]   Dimensional Testing for Reverse k-Nearest Neighbor Search [J].
Casanova, Guillaume ;
Englmeier, Elias ;
Houle, Michael E. ;
Kroeger, Peer ;
Nett, Michael ;
Schubert, Erich ;
Zimek, Arthur .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (07) :769-780
[4]   Searching in metric spaces [J].
Chávez, E ;
Navarro, G ;
BaezaYates, R ;
Marroquín, JL .
ACM COMPUTING SURVEYS, 2001, 33 (03) :273-321
[5]   Nearest neighbor queries in metric spaces [J].
Clarkson, KL .
DISCRETE & COMPUTATIONAL GEOMETRY, 1999, 22 (01) :63-93
[6]  
Coles S., 2001, An Introduction to Statistical Modelling of Extreme Values
[7]   Density-preserving projections for large-scale local anomaly detection [J].
de Vries, Timothy ;
Chawla, Sanjay ;
Houle, Michael E. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) :25-52
[8]   Statistics of extremes for IID data and breakthroughs in the estimation of the extreme value index: Laurens de Haan leading contributions [J].
Gomes, M. Ivette ;
Canto e Castro, Luisa ;
Fraga Alves, M. Isabel ;
Pestana, Dinis .
EXTREMES, 2008, 11 (01) :3-34
[9]  
GOYAL A, 2008, WSDM, P25
[10]   MEASURING THE STRANGENESS OF STRANGE ATTRACTORS [J].
GRASSBERGER, P ;
PROCACCIA, I .
PHYSICA D, 1983, 9 (1-2) :189-208