Integrating Big Data Into Evaluation: R Code for Topic Identification and Modeling

被引:8
作者
Cintron, Dakota W. [1 ]
Montrosse-Moorhead, Bianca [2 ]
机构
[1] Univ Calif San Francisco, Ctr Hlth & Community, San Francisco, CA 94118 USA
[2] Univ Connecticut, Storrs, CT USA
关键词
big data; evaluation theory; latent Dirichlet allocation; topic modeling;
D O I
10.1177/10982140211031640
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Despite the rising popularity of big data, there is speculation that evaluators have been slow adopters of these new statistical approaches. Several possible reasons have been offered for why this is the case: ethical concerns, institutional capacity, and evaluator capacity and values. In this method note, we address one of these barriers and aim to build evaluator capacity to integrate big data analytics into their studies. We focus our efforts on a specific topic modeling technique referred to as latent Dirichlet allocation (LDA) because of the ubiquitousness of qualitative textual data in evaluation. Given current equity debates, both within evaluation and the communities in which we practice, we analyze 1,796 tweets that use the hashtag #equity with the R packages topicmodels and ldatuning to illustrate the use of LDA. Furthermore, a freely available workbook for implementing LDA topic modeling is provided as Supplemental Material Online.
引用
收藏
页码:412 / 436
页数:25
相关论文
共 73 条
[11]  
Burscher B., 2016, THESIS U AMSTERDAM
[12]   A density-based method for adaptive LDA model selection [J].
Cao, Juan ;
Xia, Tian ;
Li, Jintao ;
Zhang, Yongdong ;
Tang, Sheng .
NEUROCOMPUTING, 2009, 72 (7-9) :1775-1781
[13]  
Chandra Y, 2017, QUAL MARK RES, V20, P90, DOI 10.1108/QMR-02-2016-0014
[14]  
Cho SW, 2016, WIDENING HIGHER EDUCATION PARTICIPATION: A GLOBAL PERSPECTIVE, P181, DOI 10.1016/B978-0-08-100213-1.00011-1
[15]  
Deveaud R., 2014, Document num��rique, V17, P61, DOI [10.3166/DN.17.1.61-84, DOI 10.3166/DN.17.1.61-84, 10.3166/dn.17.1.61-84]
[16]   Content analysis: Frequency distribution of words [J].
Dicle, Mehmet F. ;
Dicle, Betul .
STATA JOURNAL, 2018, 18 (02) :379-386
[17]  
Eisenstein J., 2010, P C EMP METH NAT LAN, P1277
[18]   Big data in evaluation: Experiences from using Twitter analysis to evaluate Norway's contribution to the peace process in Colombia [J].
Fabra-Mata, Javier ;
Mygind, Jesper .
EVALUATION, 2019, 25 (01) :6-22
[19]   Inference of population structure using multilocus genotype data: dominant markers and null alleles [J].
Falush, Daniel ;
Stephens, Matthew ;
Pritchard, Jonathan K. .
MOLECULAR ECOLOGY NOTES, 2007, 7 (04) :574-578
[20]  
Field A. P., 2012, Discovering Statistics Using IBM SPSS Statistics