Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology

被引:479
作者
Maier, Daniel [1 ]
Waldherr, A. [2 ]
Miltner, P. [1 ]
Wiedemann, G. [3 ]
Niekler, A. [3 ]
Keinert, A. [1 ]
Pfetsch, B. [1 ]
Heyer, G. [3 ]
Reber, U. [4 ]
Haeussler, T. [4 ]
Schmid-Petri, H. [5 ]
Adam, S. [4 ]
机构
[1] Free Univ Berlin, Inst Media & Commun Studies, Berlin, Germany
[2] Univ Munster, Dept Commun, Munster, Germany
[3] Univ Leipzig, Comp Sci Inst, Leipzig, Germany
[4] Univ Bern, Inst Commun & Media Studies, Bern, Switzerland
[5] Univ Passau, Passau, Germany
基金
瑞士国家科学基金会;
关键词
TEXT;
D O I
10.1080/19312458.2018.1430754
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
Latent Dirichlet allocation (LDA) topic models are increasingly being used in communication research. Yet, questions regarding reliability and validity of the approach have received little attention thus far. In applying LDA to textual data, researchers need to tackle at least four major challenges that affect these criteria: (a) appropriate pre-processing of the text collection; (b) adequate selection of model parameters, including the number of topics to be generated; (c) evaluation of the model's reliability; and (d) the process of validly interpreting the resulting topics. We review the research literature dealing with these questions and propose a methodology that approaches these challenges. Our overall goal is to make LDA topic modeling more accessible to communication researchers and to ensure compliance with disciplinary standards. Consequently, we develop a brief hands-on user guide for applying LDA topic modeling. We demonstrate the value of our approach with empirical data from an ongoing research project.
引用
收藏
页码:93 / 118
页数:26
相关论文
共 68 条
  • [31] Big Social Data Analytics in Journalism and Mass Communication: Comparing Dictionary-Based Text Analysis and Unsupervised Topic Modeling
    Guo, Lei
    Vargo, Chris J.
    Pan, Zixuan
    Ding, Weicong
    Ishwar, Prakash
    [J]. JOURNALISM & MASS COMMUNICATION QUARTERLY, 2016, 93 (02) : 332 - 359
  • [32] The Corpus Revolution in Lexicography
    Hanks, Patrick
    [J]. INTERNATIONAL JOURNAL OF LEXICOGRAPHY, 2012, 25 (04) : 398 - 436
  • [33] A Method of Automated Nonparametric Content Analysis for Social Science
    Hopkins, Daniel J.
    King, Gary
    [J]. AMERICAN JOURNAL OF POLITICAL SCIENCE, 2010, 54 (01) : 229 - 247
  • [34] Jacobi C, 2016, DIGIT JOURNAL, V4, P89, DOI 10.1080/21670811.2015.1093271
  • [35] Kaufman L, 2009, FINDING GROUPS DATA
  • [36] Stable Topic Modeling with Local Density Regularization
    Koltcov, Sergei
    Nikolenko, Sergey I.
    Koltsova, Olessia
    Filippov, Vladimir
    Bodrunova, Svetlana S.
    [J]. INTERNET SCIENCE, (INSCI 2016), 2016, 9934 : 176 - 188
  • [37] Mapping the Public Agenda with Topic Modeling: The Case of the Russian LiveJournal
    Koltsova, Olessia
    Koltcov, Sergei
    [J]. POLICY AND INTERNET, 2013, 5 (02): : 207 - 227
  • [38] Koltsova O, 2015, NEW MEDIA SOC, V17, P1715, DOI 10.1177/1461444814531875
  • [39] High-Reproducibility and High-Accuracy Method for Automated Topic Classification
    Lancichinetti, Andrea
    Sirer, M. Irmak
    Wang, Jane X.
    Acuna, Daniel
    Koerding, Konrad
    Amaral, Luis A. Nunes
    [J]. PHYSICAL REVIEW X, 2015, 5 (01):
  • [40] Lenci A, 2008, ITAL J LINGUIST, V20, P1