Automatic text classification to support systematic reviews in medicine

被引:83
作者
Garcia Adeva, J. J. [1 ]
Pikatza Atxa, J. M. [1 ]
Ubeda Carrillo, M. [2 ]
Ansuategi Zengotitabengoa, E. [2 ]
机构
[1] Univ Basque Country UPV EHU, Dept Comp Languages & Syst, Erabaki Grp, Donostia San Sebastian 20018, Spain
[2] Donostia Univ Hosp, Donostia San Sebastian 20014, Spain
关键词
Medical systematic reviews; Machine learning; Text mining; Text classification; CATEGORIZATION; RETRIEVAL;
D O I
10.1016/j.eswa.2013.08.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical systematic reviews answer particular questions within a very specific domain of expertise by selecting and analysing the current pertinent literature. As part of this process, the phase of screening articles usually requires a long time and significant effort as it involves a group of domain experts evaluating thousands of articles in order to find the relevant instances. Our goal is to support this process through automatic tools. There is a recent trend of applying text classification methods to semi-automate the screening phase by providing decision support to the group of experts, hence helping reduce the required time and effort. In this work, we contribute to this line of work by performing a comprehensive set of text classification experiments on a corpus resulting from an actual systematic review in the area of Internet-Based Randomised Controlled Trials. These experiments involved applying multiple machine learning algorithms combined with several feature selection techniques to different parts of the articles (i.e., titles, abstract, or both). Results are generally positive in terms of overall precision and recall measurements, reaching values of up to 84%. It is also revealing in terms of how using only article titles provides virtually as good results as when adding article abstracts. Based on the positive results, it is clear that text classification can support the screening stage of medical systematic reviews. However, selecting the most appropriate machine learning algorithms, related methods, and text sections of articles is a neglected but important requirement because of its significant impact to the end results. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1498 / 1508
页数:11
相关论文
共 43 条
[1]  
Aggarwal CharuC., 2012, MINING TEXT DATA, DOI 10.1007/978-1-4614-3223-4_6
[2]  
[Anonymous], 2018, COCHRANE HDB SYSTEMA
[3]  
[Anonymous], P AAAI 98 WORKSH LEA
[4]  
[Anonymous], 2002, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
[5]   Text categorization models for high-quality article retrieval in internal medicine [J].
Aphinyanaphongs, Y ;
Tsamardinos, I ;
Statnikov, A ;
Hardin, D ;
Aliferis, CF .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2005, 12 (02) :207-216
[6]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[7]   Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? [J].
Bastian, Hilda ;
Glasziou, Paul ;
Chalmers, Iain .
PLOS MEDICINE, 2010, 7 (09)
[8]   Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers [J].
Bekhuis, Tanja ;
Demner-Fushman, Dina .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2012, 55 (03) :197-207
[9]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[10]   Systematic review of publication bias in studies on publication bias [J].
Dubben, HH ;
Beck-Bornholdt, HP .
BRITISH MEDICAL JOURNAL, 2005, 331 (7514) :433-434