Finding better active learners for faster literature reviews

被引:40
作者
Yu, Zhe [1 ]
Kraft, Nicholas A. [2 ]
Menzies, Tim [1 ]
机构
[1] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
[2] ABB Corp Res, Raleigh, NC USA
基金
美国国家科学基金会;
关键词
Active learning; Systematic literature review; Software engineering; Primary study selection; SYSTEMATIC REVIEWS; IDENTIFICATION; CLASSIFICATION; WORKLOAD; TOOLS;
D O I
10.1007/s10664-017-9587-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Literature reviews can be time-consuming and tedious to complete. By cataloging and refactoring three state-of-the-art active learning techniques from evidence-based medicine and legal electronic discovery, this paper finds and implements FASTREAD, a faster technique for studying a large corpus of documents, combining and parametrizing the most efficient active learning algorithms. This paper assesses FASTREAD using datasets generated from existing SE literature reviews (Hall, Wahono, Radjenovi, Kitchenham et al.). Compared to manual methods, FASTREAD lets researchers find 95% relevant studies after reviewing an order of magnitude fewer papers. Compared to other state-of-the-art automatic methods, FASTREAD reviews 20-50% fewer studies while finding same number of relevant primary studies in a systematic literature review.
引用
收藏
页码:3161 / 3186
页数:26
相关论文
共 65 条
[1]  
[Anonymous], 2012, Proceedings of the 2nd International Workshop on Evidential Assessment of software technologies
[2]  
[Anonymous], 2012, Active Learning, DOI DOI 10.1007/978-3-031-01560-1
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]   TuneR: a framework for tuning software engineering tools with hands-on instructions in R [J].
Borg, Markus .
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2016, 28 (06) :427-459
[5]  
Byron C.W., P INT HLTH INF S, P819, DOI [DOI 10.1145/2110363.2110464, 10.1145/2110363.2110464]
[6]  
Carver Jeffrey C., 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), P203, DOI 10.1109/ESEM.2013.28
[7]  
Cohen Aaron M, 2006, AMIA Annu Symp Proc, P161
[8]  
Cohen Aaron M, 2010, AMIA Annu Symp Proc, V2010, P121
[9]   Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@95 measure [J].
Cohen, Aaron M. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (01) :104-104
[10]   Reducing workload in systematic review preparation using automated citation classification [J].
Cohen, AM ;
Hersh, WR ;
Peterson, K ;
Yen, PY .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2006, 13 (02) :206-219