Attribute-based Explanation of Non-Linear Embeddings of High-Dimensional Data

被引:14
作者
Sohns, Jan-Tobias [1 ]
Schmitt, Michaela [1 ]
Jirasek, Fabian [2 ]
Hasse, Hans [2 ]
Leitte, Heike [1 ]
机构
[1] TU Kaiserslautern, Visual Informat Anal Grp, Kaiserslautern, Germany
[2] TU Kaiserslautern, Lab Engn Thermodynam LTD, Kaiserslautern, Germany
关键词
Data visualization; Visualization; Task analysis; Data analysis; Topology; Image color analysis; Dimensionality reduction; embedding; augmented projections; point set contours; explainable artificial intelligence; PREDICTION; REDUCTION; POINTS; SET;
D O I
10.1109/TVCG.2021.3114870
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
引用
收藏
页码:540 / 550
页数:11
相关论文
共 62 条
[1]   COMPARATIVE-ANALYSIS OF STATISTICAL PATTERN-RECOGNITION METHODS IN HIGH-DIMENSIONAL SETTINGS [J].
AEBERHARD, S ;
COOMANS, D ;
DEVEL, O .
PATTERN RECOGNITION, 1994, 27 (08) :1065-1077
[2]  
[Anonymous], 1998, STATISTICS-ABINGDON
[3]   On minimum-area hulls [J].
Arkin, EM ;
Chiang, YJ ;
Held, M ;
Mitchell, JSB ;
Sacristan, V ;
Skiena, SS ;
Yang, TC .
ALGORITHMICA, 1998, 21 (01) :119-136
[4]   Visualizing distortions and recovering topology in continuous projection techniques [J].
Aupetit, Michael .
NEUROCOMPUTING, 2007, 70 (7-9) :1304-1330
[5]   Continuous Scatterplots [J].
Bachthaler, Sven ;
Weiskopf, Daniel .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2008, 14 (06) :1428-1435
[6]  
Bauer U., 2021, J APPL COMPUT TOPOL, V5, P391, DOI DOI 10.1007/S41468-021-00071-5
[7]   Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables [J].
Blackard, JA ;
Dean, DJ .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) :131-151
[8]  
Bokeh Development Team, 2020, BOK PYTH LIB INT VIS
[9]   Visual Analysis of Multi-Dimensional Categorical Data Sets [J].
Broeksema, Bertjan ;
Telea, Alexandru C. ;
Baudel, Thomas .
COMPUTER GRAPHICS FORUM, 2013, 32 (08) :158-169
[10]   Vapor-liquid equilibria of nonideal solutions - Utilization of theoretical methods to extend data [J].
Carlson, HC ;
Colburn, AP .
INDUSTRIAL AND ENGINEERING CHEMISTRY, 1942, 34 :581-589