RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials

被引:165
作者
Marshall, Iain J. [1 ]
Kuiper, Joel [2 ]
Wallace, Byron C. [3 ]
机构
[1] Kings Coll London, Dept Primary Care & Publ Hlth Sci, 7th Floor,Capital House,42 Weston St, London SE1 3QD, England
[2] Univ Groningen, Univ Med Ctr Groningen, Groningen, Netherlands
[3] Univ Texas Austin, Sch Informat, Austin, TX 78712 USA
关键词
systematic review; data mining; natural language processing; randomized controlled trials as topic; bias; RISK;
D O I
10.1093/jamia/ocv044
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From a (PDF-formatted) trial report, the system should determine risks of bias for the domains defined by the Cochrane Risk of Bias (RoB) tool, and extract supporting text for these judgments. Methods We algorithmically annotated 12,808 trial PDFs using data from the Cochrane Database of Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk of bias for each domain, and sentences were labeled as being informative or not. This dataset was used to train a multi-task ML model. We estimated the accuracy of ML judgments versus humans by comparing trials with two or more independent RoB assessments in the CDSR. Twenty blinded experienced reviewers rated the relevance of supporting text, comparing ML output with equivalent (human-extracted) text from the CDSR. Results By retrieving the top 3 candidate sentences per document (top3 recall), the best ML text was rated more relevant than text from the CDSR, but not significantly (60.4% ML text rated 'highly relevant' v 56.5% of text from reviews; difference +3.9%, [-3.2% to +10.9%]). Model RoB judgments were less accurate than those from published reviews, though the difference was <10% (overall accuracy 71.0% with ML v 78.3% with CDSR). Conclusion Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses.
引用
收藏
页码:193 / 201
页数:9
相关论文
共 20 条
[1]  
Adams Clive E, 2013, J Evid Based Med, V6, P232, DOI 10.1111/jebm.12072
[2]  
[Anonymous], 2010, P ACL
[3]  
[Anonymous], 1998, MULTITASK LEARNING
[4]  
[Anonymous], P ANN WORKSH COMP LE
[5]  
[Anonymous], 2009, Systematic reviews: CRD's guidancefor undertaking reviews in health care
[6]  
[Anonymous], COCHRANE HDB SYSTEMA
[7]  
[Anonymous], THESIS ILLINOIS I TE
[8]  
[Anonymous], FIT PURPOSE CENTRALI
[9]  
[Anonymous], 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.1553516
[10]   Scaling to very very large corpora for natural language disambiguation [J].
Banko, M ;
Brill, E .
39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, :26-33