LDSScanner: Exploratory Analysis of Low-Dimensional Structures in High-Dimensional Datasets

被引:61
作者
Xia, Jiazhi [1 ]
Ye, Fenjin [1 ]
Chen, Wei [2 ]
Wang, Yusi [1 ]
Chen, Weifeng [3 ]
Ma, Yuxin [2 ]
Tung, Anthony K. H. [4 ]
机构
[1] Cent South Univ, Changsha, Hunan, Peoples R China
[2] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[3] Zhejiang Univ Finance & Econ, Hangzhou, Zhejiang, Peoples R China
[4] Natl Univ Singapore, Singapore, Singapore
基金
美国国家科学基金会; 国家自然科学基金重大项目;
关键词
High-dimensional data; low-dimensional structure; subspace; manifold; visual exploration; VISUAL EXPLORATION; VISUALIZATION; REDUCTION; METRICS;
D O I
10.1109/TVCG.2017.2744098
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many approaches for analyzing a high-dimensional dataset assume that the dataset contains specific structures, e.g., clusters in linear subspaces or non-linear manifolds. This yields a trial-and-error process to verify the appropriate model and parameters. This paper contributes an exploratory interface that supports visual identification of low-dimensional structures in a high-dimensional dataset, and facilitates the optimized selection of data models and configurations. Our key idea is to abstract a set of global and local feature descriptors from the neighborhood graph-based representation of the latent low-dimensional structure, such as pairwise geodesic distance (GD) among points and pairwise local tangent space divergence (LTSD) among pointwise local tangent spaces (LTS). We propose a new LTSD-GD view, which is constructed by mapping LTSD and GD to the x axis and y axis using 1D multidimensional scaling, respectively. Unlike traditional dimensionality reduction methods that preserve various kinds of distances among points, the LTSD-GD view presents the distribution of pointwise LTS (x axis) and the variation of LTS in structures (the combination of x axis and y axis). We design and implement a suite of visual tools for navigating and reasoning about intrinsic structures of a high-dimensional dataset. Three case studies verify the effectiveness of our approach.
引用
收藏
页码:236 / 245
页数:10
相关论文
共 44 条
[1]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2]  
[Anonymous], 2002, Principal components analysis
[3]  
[Anonymous], EUR C VIS EUROVIS ST
[4]  
[Anonymous], IEEE S VIS AN SCI TE
[5]  
[Anonymous], 2010, Modern multidimensional scaling: theory and applications
[6]  
Assent I., 2007, ACM SIGKDD Explorations Newsletter, V9, P5, DOI DOI 10.1145/1345448.1345451
[7]   Subspace selection for clustering high-dimensional data [J].
Baumgartner, C ;
Plant, C ;
Kailing, K ;
Kriegel, HP ;
Kröger, P .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :11-18
[8]  
Belkin M, 2002, ADV NEUR IN, V14, P585
[9]   Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization [J].
Bertini, Enrico ;
Tatu, Andrada ;
Keim, Daniel .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) :2203-2212
[10]  
Elhamifar E., 2011, Advances in neural information processing systems, P55, DOI DOI 10.1109/TPAMI.2013.57