Big Social Data Analytics in Journalism and Mass Communication: Comparing Dictionary-Based Text Analysis and Unsupervised Topic Modeling

被引:156
作者
Guo, Lei [1 ]
Vargo, Chris J. [2 ]
Pan, Zixuan [3 ]
Ding, Weicong [4 ]
Ishwar, Prakash [1 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Univ Alabama, Tuscaloosa, AL USA
[3] Yodlee, Redwood City, CA USA
[4] Technicolor Res, Los Altos, CA USA
基金
美国国家科学基金会;
关键词
computer-assisted content analysis; unsupervised machine learning; topic modeling; political communication; Twitter; TWITTER;
D O I
10.1177/1077699016639231
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
This article presents an empirical study that investigated and compared two big data text analysis methods: dictionary-based analysis, perhaps the most popular automated analysis approach in social science research, and unsupervised topic modeling (i.e., Latent Dirichlet Allocation [LDA] analysis), one of the most widely used algorithms in the field of computer science and engineering. By applying two big data methods to make sense of the same dataset77 million tweets about the 2012 U.S. presidential electionthe study provides a starting point for scholars to evaluate the efficacy and validity of different computer-assisted methods for conducting journalism and mass communication research, especially in the area of political communication.
引用
收藏
页码:332 / 359
页数:28
相关论文
共 43 条
[1]  
[Anonymous], 2014, P 23 ACM INT C CONFE
[2]  
[Anonymous], 2014, SETTING THE AGENDA
[3]  
Bak JY, 2014, P 2014 C EMP METH NA, P1986, DOI [10.3115/V1/D14-1213, DOI 10.3115/V1/D14-1213]
[4]  
Beyer M. A., 2012, IMPROTANCE BIG DATA
[5]  
Blei D., 2006, ADV NEURAL INFORM PR
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Chuang J., 2015, Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, P175, DOI DOI 10.1016/J.IJHCS.2017.03.007
[8]   THE EFFECTIVENESS OF STRATIFIED CONSTRUCTED WEEK SAMPLING FOR CONTENT ANALYSIS OF ELECTRONIC NEWS SOURCE ARCHIVES: AP NEWSWIRE, BUSINESS WIRE, AND PR NEWSWIRE [J].
Connolly-Ahern, Colleen ;
Ahern, Lee A. ;
Bortree, Denise Sevick .
JOURNALISM & MASS COMMUNICATION QUARTERLY, 2009, 86 (04) :862-883
[9]   The subjective precision of computers: A methodological comparison with human coding in content analysis [J].
Conway, Mike .
JOURNALISM & MASS COMMUNICATION QUARTERLY, 2006, 83 (01) :186-200
[10]  
Goel S., 2013, J MANAGEMENT SCI