ProSeCo: Visual analysis of class separation measures and dataset characteristics

被引:4
作者
Bernard, Juergen [1 ,4 ]
Hutter, Marco [2 ]
Zeppelzauer, Matthias [3 ]
Sedlmair, Michael [2 ]
Munzner, Tamara [4 ]
机构
[1] Univ Zurich, Zurich, Switzerland
[2] Univ Stuttgart, Stuttgart, Germany
[3] St Polten Univ Appl Sci, St Polten, Austria
[4] Univ British Columbia, Vancouver, BC, Canada
来源
COMPUTERS & GRAPHICS-UK | 2021年 / 96卷
关键词
Computers and Graphics; Formatting; Guidelines; VALIDATION; REDUCTION; OUTLIERS;
D O I
10.1016/j.cag.2021.03.004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Class separation is an important concept in machine learning and visual analytics. We address the visual analysis of class separation measures for both high-dimensional data and its corresponding projections into 2D through dimensionality reduction (DR) methods. Although a plethora of separation measures have been proposed, it is difficult to compare class separation between multiple datasets with different characteristics, multiple separation measures, and multiple DR methods. We present ProSeCo, an interactive visualization approach to support comparison between up to 20 class separation measures and up to 4 DR methods, with respect to any of 7 dataset characteristics: dataset size, dataset dimensions, class counts, class size variability, class size skewness, outlieriness, and real-world vs. synthetically generated data. ProSeCo supports (1) comparing across measures, (2) comparing high-dimensional to dimensionallyreduced 2D data across measures, (3) comparing between different DR methods across measures, (4) partitioning with respect to a dataset characteristic, (5) comparing partitions for a selected characteristic across measures, and (6) inspecting individual datasets in detail. We demonstrate the utility of ProSeCo in two usage scenarios, using datasets [1] posted at https://osf.io/epcf9/ . (c) 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
引用
收藏
页码:48 / 60
页数:13
相关论文
共 68 条
[1]  
Albuquerque G., 2011, 2011 IEEE Conference on Visual Analytics Science and Technology, P13, DOI 10.1109/VAST.2011.6102437
[2]   Synthetic Generation of High-Dimensional Datasets [J].
Albuquerque, Georgia ;
Loewe, Thomas ;
Magnor, Marcus .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) :2317-2324
[3]  
[Anonymous], 2014, Discriminant analysis and applications
[4]  
[Anonymous], 2012, TECH REP
[5]  
[Anonymous], 2010, International Journal of Computer Science & Engineering Survey, DOI DOI 10.5121/IJCSES.2010.1207
[6]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[7]  
Aupetit M, 2016, IEEE PAC VIS SYMP, P1, DOI 10.1109/PACIFICVIS.2016.7465244
[8]  
Ball G, 1965, 699616 NTIS STANF RE
[9]  
Bernard J, 2017, EUROGRAPHICS ASS
[10]  
Bernard J, 2021, PROSECO PROBING SEPA