Can topic models be used in research evaluations? Reproducibility, validity, and reliability when compared with semantic maps

被引:13
作者
Hecking, Tobias [1 ]
Leydesdorff, Loet [2 ]
机构
[1] Univ Duisburg Essen, Dept Comp Sci & Appl Cognit Sci, Lotharstr 63, D-47057 Duisburg, Germany
[2] Univ Amsterdam, Amsterdam Sch Commun Res ASCoR, POB 15793, NL-1001 NG Amsterdam, Netherlands
关键词
topic models; LDA; co-word models; validation; decay; reliability; SCIENCE; IMPACT;
D O I
10.1093/reseval/rvz015
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
We replicate and analyze the topic model which was commissioned to King's College and Digital Science for the Research Evaluation Framework (REF 2014) in the United Kingdom: 6,638 case descriptions of societal impact were submitted by 154 higher-education institutes. We compare the Latent Dirichlet Allocation (LDA) model with Principal Component Analysis (PCA) of document-term matrices using the same data. Since topic models are almost by definition applied to text corpora which are too large to read, validation of the results of these models is hardly possible; furthermore the models are irreproducible for a number of reasons. However, removing a small fraction of the documents from the sample-a test for reliability-has on average a larger impact in terms of decay on LDA than on PCA-based models. The semantic coherence of LDA models outperforms PCA-based models. In our opinion, results of the topic models are statistical and should not be used for grant selections and micro decision-making about research without follow-up using domain-specific semanticmaps.
引用
收藏
页码:263 / 272
页数:10
相关论文
共 46 条
[1]   What is wrong with topic modeling? And how to fix it using search-based software engineering [J].
Agrawal, Amritanshu ;
Fu, Wei ;
Menzies, Tim .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 98 :74-88
[2]  
Anderson C., 2008, WIRED, DOI DOI 10.1180/MINMAG.2008.072.1.7
[3]  
[Anonymous], 1976, A theory of semiotics
[4]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[5]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[6]  
BRAAM RR, 1991, J AM SOC INFORM SCI, V42, P233, DOI 10.1002/(SICI)1097-4571(199105)42:4<233::AID-ASI1>3.0.CO
[7]  
2-I
[8]  
Briggle A., 2015, IMPACT SOCIAL SCI BL
[9]   FROM TRANSLATIONS TO PROBLEMATIC NETWORKS - AN INTRODUCTION TO CO-WORD ANALYSIS [J].
CALLON, M ;
COURTIAL, JP ;
TURNER, WA ;
BAUIN, S .
SOCIAL SCIENCE INFORMATION SUR LES SCIENCES SOCIALES, 1983, 22 (02) :191-235
[10]  
Callon M., 1986, Mapping the dynamics of science and technology, P19