FAST2: An intelligent assistant for finding relevant papers

被引:53
作者
Yu, Zhe [1 ]
Menzies, Tim [1 ]
机构
[1] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
基金
美国国家科学基金会;
关键词
Active learning; Literature reviews; Text mining; Semi-supervised learning; Relevance feedback; Selection process; CLASSIFICATION; WORKLOAD;
D O I
10.1016/j.eswa.2018.11.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Literature reviews are essential for any researcher trying to keep up to date with the burgeoning software engineering literature. Finding relevant papers can be hard due to the huge amount of candidates provided by search. FAST(2) is a novel tool for assisting the researchers to find the next promising paper to read. This paper describes FAST(2) and tests it on four large systematic literature review datasets. We show that FAST(2) robustly optimizes the human effort to find most (95%) of the relevant software engineering papers while also compensating for the errors made by humans during the review process. The effectiveness of FAST(2) can be attributed to three key innovations: (1) a novel way of applying external domain knowledge (a simple two or three keyword search) to guide the initial selection of papers-which helps to find relevant research papers faster with less variances; (2) an estimator of the number of remaining relevant papers yet to be found-which helps the reviewer decide when to stop the review; (3) a novel human error correction algorithm-which corrects a majority of human misclassifications (labeling relevant papers as non-relevant or vice versa) without imposing too much extra human effort. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:57 / 71
页数:15
相关论文
共 58 条
[1]  
[Anonymous], 2016, EPC METHODS EXPLORAT
[2]  
[Anonymous], 2012, Proceedings of the 2nd International Workshop on Evidential Assessment of software technologies
[3]  
[Anonymous], 2018, EMPIRICAL SOFTWARE E
[4]  
Carver Jeffrey C., 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), P203, DOI 10.1109/ESEM.2013.28
[5]  
Chapelle O., 2017, Journal of the Royal Statistical Society, V172, P1826
[6]   DOMINANCE STATISTICS - ORDINAL ANALYSES TO ANSWER ORDINAL QUESTIONS [J].
CLIFF, N .
PSYCHOLOGICAL BULLETIN, 1993, 114 (03) :494-509
[7]   Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@95 measure [J].
Cohen, Aaron M. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (01) :104-104
[8]   Reducing workload in systematic review preparation using automated citation classification [J].
Cohen, AM ;
Hersh, WR ;
Peterson, K ;
Yen, PY .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2006, 13 (02) :206-219
[9]   Navigating Imprecision in Relevance Assessments on the Road to Total Recall: Roger and Me [J].
Cormack, Gordon V. ;
Grossman, Maura R. .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :5-14
[10]   Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery [J].
Cormack, Gordon V. ;
Grossman, Maura R. .
SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, :153-162