Scikit-Dimension: A Python']Python Package for Intrinsic Dimension Estimation

被引:43
作者
Bac, Jonathan [1 ,2 ,3 ]
Mirkes, Evgeny M. [4 ,5 ]
Gorban, Alexander N. [4 ,5 ]
Tyukin, Ivan [4 ,5 ]
Zinovyev, Andrei [1 ,2 ,3 ,5 ]
机构
[1] PSL Res Univ, Inst Curie, F-75248 Paris, France
[2] INSERM, U900, F-75248 Paris, France
[3] PSL Res Univ, Mines ParisTech, CBIO Ctr Computat Biol, F-75272 Paris, France
[4] Univ Leicester, Dept Math, Leicester LE1 7RH, Leics, England
[5] Lobachevsky Univ, Lab Adv Methods High Dimens Data Anal, Nizhnii Novgorod 603105, Russia
基金
英国工程与自然科学研究理事会;
关键词
intrinsic dimension; effective dimension; !text type='Python']Python[!/text] package; method benchmarking; PRINCIPAL COMPONENT ANALYSIS;
D O I
10.3390/e23101368
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data.</p>
引用
收藏
页数:12
相关论文
共 75 条
[1]  
Albergante L, 2019, IEEE IJCNN
[2]   Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph [J].
Albergante, Luca ;
Mirkes, Evgeny ;
Bac, Jonathan ;
Chen, Huidong ;
Martin, Alexis ;
Faure, Louis ;
Barillot, Emmanuel ;
Pinello, Luca ;
Gorban, Alexander ;
Zinovyev, Andrei .
ENTROPY, 2020, 22 (03)
[3]   Data segmentation based on the local intrinsic dimension [J].
Allegra, Michele ;
Facco, Elena ;
Denti, Francesco ;
Laio, Alessandro ;
Mira, Antonietta .
SCIENTIFIC REPORTS, 2020, 10 (01)
[4]  
Amblard E., 2021, HUBNESS REDUCTION IM, DOI [10.1101/2021.03.18.435808, DOI 10.1101/2021.03.18.435808]
[5]  
Amsaleg L., 2019, SIAM International Conference on Data Mining, P181, DOI DOI 10.1137/1.9781611975673.21
[6]   Extreme-value-theoretic estimation of local intrinsic dimensionality [J].
Amsaleg, Laurent ;
Chelly, Oussama ;
Furon, Teddy ;
Girard, Stephane ;
Houle, Michael E. ;
Kawarabayashi, Ken-ichi ;
Nett, Michael .
DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (06) :1768-1805
[7]  
[Anonymous], 2021, ARXIV210211425
[8]  
[Anonymous], 1982, PATTERN RECOGN
[9]   Local intrinsic dimensionality estimators based on concentration of measure [J].
Bac, Jonathan ;
Zinovyev, Andrei .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[10]   Lizard Brain: Tackling Locally Low-Dimensional Yet Globally Complex Organization of Multi-Dimensional Datasets [J].
Bac, Jonathan ;
Zinovyev, Andrei .
FRONTIERS IN NEUROROBOTICS, 2020, 13