Semantic Visualization with Neighborhood Graph Regularization

被引:5
作者
Le, Tuan M. V. [1 ]
Lauw, Hady W. [1 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, 80 Stamford Rd, Singapore 178902, Singapore
关键词
DIMENSIONALITY REDUCTION; WIKIPEDIA;
D O I
10.1613/jair.4983
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances. We consider the problem of semantic visualization by jointly modeling topics and visualization on the intrinsic document manifold, modeled using a neighborhood graph. Each document has both a topic distribution and visualization coordinate. Specifically, we propose an unsupervised probabilistic model, called Semafore, which aims to preserve the manifold in the lower-dimensional spaces through a neighborhood regularization framework designed for the semantic visualization task. To validate the efficacy of Semafore, our comprehensive experiments on a number of real-life text datasets of news articles and Web pages show that the proposed methods outperform the state-of-the-art baselines on objective evaluation metrics.
引用
收藏
页码:1091 / 1133
页数:43
相关论文
共 64 条
[1]   PARAMAP vs. isomap: A comparison of two nonlinear mapping algorithms [J].
Akkucuk, Ulas ;
Carroll, J. Douglas .
JOURNAL OF CLASSIFICATION, 2006, 23 (02) :221-254
[2]  
[Anonymous], 2012, MATRIX COMPUTATIONS
[3]  
[Anonymous], 28 AAAI C ART INT
[4]  
[Anonymous], P INT C MACH LEARN I
[5]  
[Anonymous], 1995, P TEXT RETR C
[6]  
[Anonymous], 2010, P 27 INT C INT C MAC
[7]  
[Anonymous], AUSTR DOC COMP S ADC
[8]  
[Anonymous], 2008, P ACM C INF KNOWL MA
[9]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[10]  
[Anonymous], P INT C MACH LEARN I