UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization

被引:194
作者
Choo, Jaegul [1 ]
Lee, Changhyun [1 ]
Reddy, Chandan K. [2 ]
Park, Haesun [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Wayne State Univ, Detroit, MI 48202 USA
基金
美国国家科学基金会;
关键词
Latent Dirichlet allocation; nonnegative matrix factorization; topic modeling; visual analytics; interactive clustering; text analytics; VISUAL ANALYTICS;
D O I
10.1109/TVCG.2013.212
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Inter active Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.
引用
收藏
页码:1992 / 2001
页数:10
相关论文
共 38 条
[1]  
[Anonymous], 1986, Principal component analysis
[2]  
[Anonymous], P SPIE
[3]  
[Anonymous], 2014, Machine Learning, DOI DOI 10.1007/S10994-013-5413-0
[4]  
[Anonymous], 2011, P INT JOINT C ART IN
[5]  
[Anonymous], 2010, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10
[6]   Animation, Small Multiples, and the Effect of Mental Map Preservation in Dynamic Graphs [J].
Archambault, Daniel ;
Purchase, Helen C. ;
Pinaud, Bruno .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (04) :539-552
[7]   Learning Topic Models - Going beyond SVD [J].
Arora, Sanjeev ;
Ge, Rong ;
Moitra, Ankur .
2012 IEEE 53RD ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2012, :1-10
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   iVIBRATE: Interactive visualization-based framework for clustering large datasets [J].
Chen, Keke ;
Liu, Ling .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2006, 24 (02) :245-294
[10]   Customizing Computational Methods for Visual Analytics with Big Data [J].
Choo, Jaegul ;
Park, Haesun .
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) :22-28