Examining Intermediate Data Reduction Algorithms for use with t-SNE

被引:3
作者
Campbell, Aaron [1 ]
Caudle, Kyle [1 ]
Hoover, Randy C. [1 ]
机构
[1] South Dakota Sch Mines & Technol, Rapid City, SD 57701 USA
来源
PROCEEDINGS OF THE 2019 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTE AND DATA ANALYSIS (ICCDA 2019) | 2019年
关键词
data reduction; data visualization; clustering;
D O I
10.1145/3314545.3314549
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
t-distributed Stochastic Neighbor Embedding (t-SNE) is a data visualization tool that was developed to provide a flexible, non-parametric method for mapping high dimensional data onto a two or three dimensional subspace for data visualization. This paper observes the effects of using different intermediate data reduction algorithms (e.g., Principal Component Analysis, Independent Component Analysis, Linear Discriminant Analysis, Sammon Mapping, and Local Linear Embedding) to first reduce the data to an intermediate subspace prior to applying t-SNE for visualization. Our research shows that no intermediate step in the visualization process is trivial, and application dependent knowledge should be utilized to ensure the best possible visualization in lower dimensional spaces. Experimental results are presented for several common data sets where we illustrate that, for clustering applications and visualization of class separation of multi-class data, each algorithm tested results in significantly different mappings.
引用
收藏
页码:36 / 42
页数:7
相关论文
共 21 条
  • [1] Principal component analysis
    Abdi, Herve
    Williams, Lynne J.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04): : 433 - 459
  • [2] Face recognition by independent component analysis
    Bartlett, MS
    Movellan, JR
    Sejnowski, TJ
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (06): : 1450 - 1464
  • [3] Belhumeur P., 1997, Eigenfaces vs. fisherfaces, V19, P711
  • [4] AN INFORMATION MAXIMIZATION APPROACH TO BLIND SEPARATION AND BLIND DECONVOLUTION
    BELL, AJ
    SEJNOWSKI, TJ
    [J]. NEURAL COMPUTATION, 1995, 7 (06) : 1129 - 1159
  • [5] Borg I., 2005, Modern Multidimensional Scaling: Theory and Applications, V2nd
  • [6] The use of multiple measurements in taxonomic problems
    Fisher, RA
    [J]. ANNALS OF EUGENICS, 1936, 7 : 179 - 188
  • [7] Fukunaga K, 1990, INTRO STAT PATTERN R, DOI DOI 10.5555/92131
  • [8] Hamel P., 11 INT SOC MUS INF R
  • [9] Hasdorff L., 1976, Gradient Optimization and Nonlinear Control
  • [10] Hinton Geoffrey, 2003, Adv. Neural Inf. Proces. Syst., V15, P857