Mining Ethnic Content Online with Additively Regularized Topic Models

被引:4
作者
Apishev, Murat [2 ,3 ,4 ,5 ]
Koltcov, Sergei [1 ,2 ]
Koltsova, Olessia [1 ]
Nikolenko, Sergey [1 ,4 ]
Vorontsov, Konstantin [6 ,7 ]
机构
[1] Natl Res Univ, Higher Sch Econ, Lab Internet Studies, St Petersburg, Russia
[2] Natl Res Univ, Higher Sch Econ, Dept Appl Math & Comp Sci, St Petersburg, Russia
[3] Moscow MV Lomonosov State Univ, Moscow, Russia
[4] Steklov Inst Math St Petersburg, Lab Math Log, St Petersburg, Russia
[5] Yandex, Search Dept, Moscow, Russia
[6] Yandex, Moscow, Russia
[7] Moscow Inst Phys & Technol, Moscow, Russia
来源
COMPUTACION Y SISTEMAS | 2016年 / 20卷 / 03期
基金
俄罗斯科学基金会;
关键词
Topic modeling; additive regularization of topic models; computational social science;
D O I
10.13053/CyS-20-3-2473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social studies of the Internet have adopted large-scale text mining for unsupervised discovery of topics related to specific subjects. A recently developed approach to topic modeling, additive regularization of topic models (ARTM), provides fast inference and more control over the topics with a wide variety of possible regularizers than developing LDA extensions. We apply ARTM to mining ethnic-related content from Russian-language blogosphere, introduce a new combined regularizer, and compare models derived from ARTM with LDA. We show with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality.
引用
收藏
页码:387 / 403
页数:17
相关论文
共 38 条
[1]  
Agrawal A., 2016, ARXIV E PRINTS
[2]  
Andrzejewski David, 2009, Proc Int Conf Mach Learn, V382, P25
[3]  
Andrzejewski David, 2009, P NAACL HLT 2009 WOR, P43, DOI DOI 10.3115/1621829.1621835
[4]  
[Anonymous], 2013, INTERNAL REPORT
[5]  
Apishev M., 2016, P 15 MEX INT C ART I
[6]  
Asuncion A., 2009, P 25 C UNC ART INT, P27, DOI DOI 10.1080/10807030390248483
[7]  
Blei D., 2006, ADV NEURAL INFORM PR, V18
[8]  
Blei D.M., 2007, ADV NEURAL INFORM PR, V22
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]  
Blei DM, 2006, ICML, P113, DOI DOI 10.1145/1143844.1143859