Exemplar-based Visualization of Large Document Corpus

被引:50
作者
Chen, Yanhua [1 ]
Wang, Lijun [1 ]
Dong, Ming [1 ]
Hua, Jing [2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Machine Vis & Pattern Recognit Lab, Detroit, MI 48202 USA
[2] Wayne State Univ, Dept Comp Sci, Graph & Imaging Lab, Detroit, MI 48202 USA
基金
美国国家科学基金会;
关键词
Exemplar; large-scale document visualization; multidimensional projection; EXPLORATION; ALGORITHM;
D O I
10.1109/TVCG.2009.140
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the rapid growth of the World Wide Web and electronic information services, text corpus is becoming available online at an incredible rate. By displaying text data in a logical layout (e.g., color graphs), text visualization presents a direct way to observe the documents as well as understand the relationship between them. In this paper, we propose a novel technique, Exemplar-based Visualization (EV), to visualize an extremely large text corpus. Capitalizing on recent advances in matrix approximation and decomposition, EV presents a probabilistic multidimensional projection model in the low-rank text subspace with a sound objective function. The probability of each document proportion to the topics is obtained through iterative optimization and embedded to a low dimensional space using parameter embedding. By selecting the representative exemplars, we obtain a compact approximation of the data. This makes the visualization highly efficient and flexible. In addition, the selected exemplars neatly summarize the entire data set and greatly reduce the cognitive overload in the visualization, leading to an easier interpretation of large text corpus. Empirically, we demonstrate the superior performance of EV through extensive experiments performed on the publicly available text data sets.
引用
收藏
页码:1161 / 1168
页数:8
相关论文
共 24 条
[1]  
[Anonymous], 2001, Pattern Classification
[2]  
Berkhin P., 2002, SURVEY CLUSTERING DA
[3]   Algorithm 844: Computing sparse reduced-rank approximations to sparse matrices [J].
Berry, MW ;
Pulatova, SA ;
Stewart, GW .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2005, 31 (02) :252-269
[4]   Visualizing knowledge domains [J].
Börner, K ;
Chen, CM ;
Boyack, KW .
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 :179-255
[5]  
Cox T.F., 2001, MULTIDIMENSIONAL SCA
[6]   From visual data exploration to visual data mining: A survey [J].
de Oliveira, MCF ;
Levkowitz, H .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2003, 9 (03) :378-394
[7]  
DING C, 2008, IEEE T PATT IN PRESS
[8]   On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing [J].
Ding, Chris ;
Li, Tao ;
Peng, Wei .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (08) :3913-3927
[9]  
Fua Y H., 1999, Proc. of the conference on Visualization, P43, DOI DOI 10.1109/VISUAL.1999.809866
[10]  
Hofmann T, 1999, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P289