Analyzing Quality Measurements for Dimensionality Reduction

被引:1
作者
Thrun, Michael C. [1 ,2 ]
Maerte, Julian [1 ]
Stier, Quirin [2 ]
机构
[1] Philipps Univ Marburg, Math & Comp Sci, Hans Meerwein Str 6, D-35043 Marburg, Germany
[2] IAP GmbH Intelligent Analyt Projects, Birken 10a, D-29352 Adelheidsdorf, Germany
关键词
unsupervised machine learning; dimensionality reduction; high-dimensional data visualization; information visualization; projection methods; quality measurement; TOPOLOGY PRESERVATION; NEIGHBORHOOD PRESERVATION; DATA VISUALIZATION; GRAPH; MAPS; FRAMEWORK; POINTS; FIT;
D O I
10.3390/make5030056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dimensionality reduction methods can be used to project high-dimensional data into low-dimensional space. If the output space is restricted to two dimensions, the result is a scatter plot whose goal is to present insightful visualizations of distance- and density-based structures. The topological invariance of dimension indicates that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional distances. In praxis, projections of several datasets with distance- and density-based structures show a misleading interpretation of the underlying structures. The examples outline that the evaluation of projections remains essential. Here, 19 unsupervised quality measurements (QM) are grouped into semantic classes with the aid of graph theory. We use three representative benchmark datasets to show that QMs fail to evaluate the projections of straightforward structures when common methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation projection, or t-distributed stochastic neighbor embedding (t-SNE) are applied. This work shows that unsupervised QMs are biased towards assumed underlying structures. Based on insights gained from graph theory, we propose a new quality measurement called the Gabriel Classification Error (GCE). This work demonstrates that GCE can make an unbiased evaluation of projections. The GCE is accessible within the R package DR quality available on CRAN.
引用
收藏
页码:1076 / 1118
页数:43
相关论文
共 88 条
[1]   PARAMAP vs. isomap: A comparison of two nonlinear mapping algorithms [J].
Akkucuk, Ulas ;
Carroll, J. Douglas .
JOURNAL OF CLASSIFICATION, 2006, 23 (02) :221-254
[2]  
[Anonymous], 2014, R LANG ENV STAT COMP, V2014
[3]  
Aupetit M., 2003, P 11 EUR S ART NEUR, P45
[4]   Visualizing distortions and recovering topology in continuous projection techniques [J].
Aupetit, Michael .
NEUROCOMPUTING, 2007, 70 (7-9) :1304-1330
[5]   Overview and comparative study of dimensionality reduction techniques for high dimensional data [J].
Ayesha, Shaeela ;
Hanif, Muhammad Kashif ;
Talib, Ramzan .
INFORMATION FUSION, 2020, 59 :44-58
[6]   QUANTIFYING THE NEIGHBORHOOD PRESERVATION OF SELF-ORGANIZING FEATURE MAPS [J].
BAUER, HU ;
PAWELZIK, KR .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (04) :570-579
[7]   Neural maps and topographic vector quantization [J].
Bauer, HU ;
Herrmann, M ;
Villmann, T .
NEURAL NETWORKS, 1999, 12 (4-5) :659-676
[8]   CQoCO: A measure for comparative quality of coverage and organization for self-organizing maps [J].
Beaton, Derek ;
Valova, Iren ;
MacLean, Daniel .
NEUROCOMPUTING, 2010, 73 (10-12) :2147-2159
[9]  
Berg de M., 2008, Computational geometry: Algorithms and applications, DOI DOI 10.1007/978-3-540-77974-2
[10]   AN INDEX OF TOPOLOGICAL PRESERVATION FOR FEATURE-EXTRACTION [J].
BEZDEK, JC ;
PAL, NR .
PATTERN RECOGNITION, 1995, 28 (03) :381-391