Reducing systematic review workload through certainty-based screening

被引:107
作者
Miwa, Makoto [1 ,2 ,3 ]
Thomas, James [4 ]
O'Mara-Eves, Alison [4 ]
Ananiadou, Sophia [1 ,2 ]
机构
[1] Univ Manchester, Natl Ctr Text Min, Manchester M1 7DN, Lancs, England
[2] Univ Manchester, Sch Comp Sci, Manchester Inst Biotechnol, Manchester M1 7DN, Lancs, England
[3] Toyota Technol Inst, Tempaku Ku, Nagoya, Aichi 4688511, Japan
[4] Univ London, Evidence Policy & Practice Informat & Coordinatin, Inst Educ, Social Sci Res Unit, London, England
基金
英国医学研究理事会;
关键词
Systematic reviews; Text mining; Certainty; Active learning; CLASSIFICATION;
D O I
10.1016/j.jbi.2014.06.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In systematic reviews, the growing number of published studies imposes a significant screening workload on reviewers. Active learning is a promising approach to reduce the workload by automating some of the screening decisions, but it has been evaluated for a limited number of disciplines. The suitability of applying active learning to complex topics in disciplines such as social science has not been studied, and the selection of useful criteria and enhancements to address the data imbalance problem in systematic reviews remains an open problem. We applied active learning with two criteria (certainty and uncertainty) and several enhancements in both clinical medicine and social science (specifically, public health) areas, and compared the results in both. The results show that the certainty criterion is useful for finding relevant documents, and weighting positive instances is promising to overcome the data imbalance problem in both data sets. Latent dirichlet allocation (LDA) is also shown to be promising when little manually-assigned information is available. Active learning is effective in complex topics, although its efficiency is limited due to the difficulties in text classification. The most promising criterion and weighting method are the same regardless of the review topic, and unsupervised techniques like LDA have a possibility to boost the performance of active learning without manual annotation. (C) 2014 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).
引用
收藏
页码:242 / 253
页数:12
相关论文
共 24 条
[1]   Supporting Systematic Reviews Using Text Mining [J].
Ananiadou, Sophia ;
Rea, Brian ;
Okazaki, Naoaki ;
Procter, Rob ;
Thomas, James .
SOCIAL SCIENCE COMPUTER REVIEW, 2009, 27 (04) :509-523
[2]  
[Anonymous], 2012, ARTIF INTELL
[3]  
[Anonymous], 2006, Text mining for biology and biomedicine
[4]   Towards Automating the Initial Screening Phase of a Systematic Review [J].
Bekhuis, Tanja ;
Demner-Fushman, Dina .
MEDINFO 2010, PTS I AND II, 2010, 160 :146-150
[5]  
Bickel S, 2009, J MACH LEARN RES, V10, P2137
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]   A brief history of research synthesis [J].
Chalmers, I ;
Hedges, LV ;
Cooper, H .
EVALUATION & THE HEALTH PROFESSIONS, 2002, 25 (01) :12-37
[8]   Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@95 measure [J].
Cohen, Aaron M. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (01) :104-104
[9]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[10]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871