Interactive topic modeling

被引:161
作者
Hu, Yuening [1 ]
Boyd-Graber, Jordan [2 ,3 ]
Satinoff, Brianna [1 ]
Smith, Alison [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Maryland, ISch, College Pk, MD 20742 USA
[3] Univ Maryland, UMIACS, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
Topic models; Latent Dirichlet Allocation; Feedback; Interactive topic modeling; Online learning; Gibbs sampling;
D O I
10.1007/s10994-013-5413-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic models are a useful and ubiquitous tool for understanding large corpora. However, topic models are not perfect, and for many users in computational social science, digital humanities, and information studies-who are not machine learning experts-existing models and frameworks are often a "take it or leave it" proposition. This paper presents a mechanism for giving users a voice by encoding users' feedback to topic models as correlations between words into a topic model. This framework, interactive topic modeling (itm), allows untrained users to encode their feedback easily and iteratively into the topic models. Because latency in interactive systems is crucial, we develop more efficient inference algorithms for tree-based topic models. We validate the framework both with simulated and real users.
引用
收藏
页码:423 / 469
页数:47
相关论文
共 93 条
[1]  
Abney S., 1999, Proceedings of the ACL Workshop on Unsupervised Learning in Natural Language Processing, P1
[2]  
Ahmed A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P39
[3]  
[Anonymous], P EMP METH NAT LANG
[4]  
[Anonymous], P INT C MACH LEARN
[5]  
[Anonymous], 2004, TECHNICAL REPORT
[6]  
[Anonymous], 2005, Illuminating the path: The research and development agenda for visual analytics (Tech. Rep.)
[7]  
[Anonymous], 2009, ADV NEURAL INFORM PR
[8]  
[Anonymous], 2007, P ART INT STAT
[9]  
[Anonymous], P EMP METH NAT LANG
[10]  
[Anonymous], 2010, EMNLP