A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

被引:12
作者
Ventocilla, Elio [1 ]
Riveiro, Maria [1 ,2 ]
机构
[1] Univ Skovde, Sch Informat, SE-54131 Skovde, Sweden
[2] Univ Jonkoping, Sch Engn, Jonkoping, Sweden
关键词
Cluster patterns; visualization; data structure; user study; multidimensional data; DIMENSIONALITY REDUCTION; PROJECTION; TAXONOMY;
D O I
10.1177/1473871620922166
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, k, embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of k as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with k values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward's hierarchical clustering) are likely to lead to estimates of k that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.
引用
收藏
页码:318 / 338
页数:21
相关论文
共 39 条
  • [1] Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
  • [2] Brehmer M., 2014, P 5 WORKSH TIM ERR N, P1, DOI DOI 10.1145/2669557.2669559
  • [3] Charrad M, 2014, J STAT SOFTW, V61, P1
  • [4] Dua D., 2017, UCI machine learning repository
  • [5] Toward a Quantitative Survey of Dimension Reduction Techniques
    Espadoto, Mateus
    Martins, Rafael M.
    Kerren, Andreas
    Hirata, Nina S. T.
    Telea, Alexandru C.
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (03) : 2153 - 2173
  • [6] Etemadpour Ronak, 2014, 5th International Conference on Information Visualization Theory and Applications (IVAPP 2014). Proceedings, P276
  • [7] Perception-Based Evaluation of Projection Methods for Multidimensional Data Visualization
    Etemadpour, Ronak
    Motta, Robson
    de Souza Paiva, Jose Gustavo
    Minghim, Rosane
    Ferreira de Oliveira, Maria Cristina
    Linsen, Lars
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2015, 21 (01) : 81 - 94
  • [8] A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis
    Fahad, Adil
    Alshatri, Najlaa
    Tari, Zahir
    Alamri, Abdullah
    Khalil, Ibrahim
    Zomaya, Albert Y.
    Foufou, Sebti
    Bouras, Abdelaziz
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 267 - 279
  • [9] Fritzke B., 1995, Advances in Neural Information Processing Systems 7, P625
  • [10] Garcia-Fernandez FJ, EUROVIS WORKSH VIS A