Automatic classification of literature in systematic reviews on food safety using machine learning

被引:18
作者
van den Bulk, Leonieke M. [1 ]
Bouzembrak, Yamine [1 ]
Gavai, Anand [1 ]
Liu, Ningjing [1 ]
van den Heuvel, Lukas J. [1 ]
Marvin, Hans J. P. [1 ]
机构
[1] Wageningen Food Safety Res, Akkermaalsbos 2, NL-6708 WB Wageningen, Netherlands
关键词
Literature reviews; Text mining; Classification models; Document screening; Artificial intelligence; Food safety hazards; TEXT CATEGORIZATION; SENTIMENT ANALYSIS; EXTRACTION;
D O I
10.1016/j.crfs.2021.12.010
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
Systematic reviews are used to collect relevant literature to answer a research question in a way that is clear, thorough, unbiased and reproducible. They are implemented as a standard method in the domain of food safety to obtain a literature overview on the state-of-the-art research related to food safety topics of interest. A disadvantage to systematic reviews, however, is that this process is time-consuming and requires expert domain knowledge. The work reported here aims to reduce the time needed by an expert to screen all possible relevant articles by applying machine learning techniques to classify the articles automatically as either relevant or not relevant. Eight different machine learning algorithms and ensembles of all combinations of these algorithms were tested on two different systematic reviews on food safety (i.e. chemical hazards in cereals and leafy greens). The results showed that the best performance was obtained by an ensemble of naive Bayes and a support vector machine, resulting in an average decrease of 32.8% in the amount of articles the expert has to read and an average decrease in irrelevant articles of 57.8% while keeping 95% of the relevant articles. It was concluded that automatic classification of the literature in a systematic literature review can support experts in their task and save valuable time without compromising the quality of the review.
引用
收藏
页码:84 / 95
页数:12
相关论文
共 101 条
[1]   Classification of Fake News by Fine-tuning Deep Bidirectional Transformers based Language Model [J].
Aggarwal, Akshay ;
Chauhan, Aniruddha ;
Kumar, Deepika ;
Mittal, Mamta ;
Verma, Sharad .
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (27) :1-12
[2]  
Aharoni R, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3874
[3]  
Ain QT, 2017, INT J ADV COMPUT SC, V8, P424
[4]   Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques [J].
Akinyelu, Andronicus A. .
JOURNAL OF COMPUTER SECURITY, 2021, 29 (05) :473-529
[5]   Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation [J].
Alzamzami, Fatimah ;
Hoda, Mohamad ;
El Saddik, Abdulmotaleb .
IEEE ACCESS, 2020, 8 :101840-101858
[6]  
[Anonymous], 2013, Applied logistic regression
[7]  
[Anonymous], MACH LEARN
[8]  
[Anonymous], 2016, GSTF J COMPUT, DOI [DOI 10.7603/S40601-016-0016-9, 10.7603/s40601-016-0016-9]
[9]  
Aries A., 2019, ARXIV PREPRINT ARXIV
[10]  
Banach J. L., 2019, Report 2019.013