Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets

被引:31
作者
Granata, Daniele [1 ]
Carnevale, Vincenzo [1 ]
机构
[1] Coll Sci & Technol, ICMS, Philadelphia, PA 19122 USA
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
REDUCTION; ENTROPY;
D O I
10.1038/srep31377
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The collective behavior of a large number of degrees of freedom can be often described by a handful of variables. This observation justifies the use of dimensionality reduction approaches to model complex systems and motivates the search for a small set of relevant "collective" variables. Here, we analyze this issue by focusing on the optimal number of variable needed to capture the salient features of a generic dataset and develop a novel estimator for the intrinsic dimension (ID). By approximating geodesics with minimum distance paths on a graph, we analyze the distribution of pairwise distances around the maximum and exploit its dependency on the dimensionality to obtain an ID estimate. We show that the estimator does not depend on the shape of the intrinsic manifold and is highly accurate, even for exceedingly small sample sizes. We apply the method to several relevant datasets from image recognition databases and protein multiple sequence alignments and discuss possible interpretations for the estimated dimension in light of the correlations among input variables and of the information content of the dataset.
引用
收藏
页数:12
相关论文
共 33 条
  • [21] Mandelbrot B. B., 1977, STAT MECH STAT METHO, P331, DOI [10.1007/978-1-4613-4166-6_15, DOI 10.1007/978-1-4613-4166-6_15]
  • [22] Fractal-Based Intrinsic Dimension Estimation and Its Application in Dimensionality Reduction
    Mo, Dengyao
    Huang, Samuel H.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (01) : 59 - 71
  • [23] Direct-coupling analysis of residue coevolution captures native contacts across many protein families
    Morcos, Faruck
    Pagnani, Andrea
    Lunt, Bryan
    Bertolino, Arianna
    Marks, Debora S.
    Sander, Chris
    Zecchina, Riccardo
    Onuchic, Jose N.
    Hwa, Terence
    Weigt, Martin
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (49) : E1293 - E1301
  • [24] Evolutionary imprint of activation: The design principles of VSDs
    Palovcak, Eugene
    Delemotte, Lucie
    Klein, Michael L.
    Carnevale, Vincenzo
    [J]. JOURNAL OF GENERAL PHYSIOLOGY, 2014, 143 (02) : 145 - 156
  • [25] INTRINSIC DIMENSIONALITY ESTIMATOR FROM NEAR-NEIGHBOR INFORMATION
    PETTIS, KW
    BAILEY, TA
    JAIN, AK
    DUBES, RC
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (01) : 25 - 37
  • [26] Philip J., 2008, DISTANCE 2 RANDOM PO
  • [27] Nonlinear dimensionality reduction by locally linear embedding
    Roweis, ST
    Saul, LK
    [J]. SCIENCE, 2000, 290 (5500) : 2323 - +
  • [28] Novel high intrinsic dimensionality estimators
    Rozza, A.
    Lombardi, G.
    Ceruti, C.
    Casiraghi, E.
    Campadelli, P.
    [J]. MACHINE LEARNING, 2012, 89 (1-2) : 37 - 65
  • [29] INTRINSIC LIMITS ON DIMENSION CALCULATIONS
    SMITH, LA
    [J]. PHYSICS LETTERS A, 1988, 133 (06) : 283 - 288
  • [30] A global geometric framework for nonlinear dimensionality reduction
    Tenenbaum, JB
    de Silva, V
    Langford, JC
    [J]. SCIENCE, 2000, 290 (5500) : 2319 - +