An Interactive Visual Testbed System for Dimension Reduction and Clustering of Large-scale High-dimensional Data

被引:10
作者
Choo, Jaegul [1 ]
Lee, Hanseung [1 ]
Liu, Zhicheng [1 ]
Stasko, John [1 ]
Park, Haesun [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
VISUALIZATION AND DATA ANALYSIS 2013 | 2013年 / 8654卷
关键词
clustering; dimension reduction; high-dimensional data; visual knowledge discovery; NONLINEAR DISCRIMINANT-ANALYSIS; MANIFOLDS;
D O I
10.1117/12.2007316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced computational methods. Visual analytics approaches have contributed greatly to data understanding and analysis due to their capability of leveraging humans' ability for quick visual perception. However, visual analytics targeting large-scale data such as text and image data has been challenging due to the limited screen space in terms of both the numbers of data points and features to represent. Among various computational methods supporting visual analytics, dimension reduction and clustering have played essential roles by reducing these numbers in an intelligent way to visually manageable sizes. Given numerous dimension reduction and clustering methods available, however, the decision on the choice of algorithms and their parameters becomes difficult. In this paper, we present an interactive visual testbed system for dimension reduction and clustering in a large-scale high-dimensional data analysis. The testbed system enables users to apply various dimension reduction and clustering methods with different settings, visually compare the results from different algorithmic methods to obtain rich knowledge for the data and tasks at hand, and eventually choose the most appropriate path for a collection of algorithms and parameters. Using various data sets such as documents, images, and others that are already encoded in vectors, we demonstrate how the testbed system can support these tasks.
引用
收藏
页数:15
相关论文
共 39 条
[1]  
[Anonymous], 2002, Principal components analysis
[2]  
[Anonymous], 2007, Uci machine learning repository
[3]  
[Anonymous], P 2012 SIAM INT C DA
[4]   THE GRAND TOUR - A TOOL FOR VIEWING MULTIDIMENSIONAL DATA [J].
ASIMOV, D .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1985, 6 (01) :128-143
[5]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]   Diffusion maps [J].
Coifman, Ronald R. ;
Lafon, Stephane .
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2006, 21 (01) :5-30
[8]  
Cook D, 2007, USE R, P1, DOI 10.1007/978-0-387-71762-3
[9]  
Cox T.F., 2000, Multidimensional Scaling, V2nd ed.
[10]   A procrustes problem on the Stiefel manifold [J].
Eldén, L ;
Park, H .
NUMERISCHE MATHEMATIK, 1999, 82 (04) :599-619