Feature Engineering and a Proposed Decision-Support System for Systematic Reviewers of Medical Evidence

被引：23

作者：

Bekhuis, Tanja ^{[1
]}

Tseytlin, Eugene ^{[1
]}

Mitchell, Kevin J. ^{[1
]}

Demner-Fushman, Dina ^{[2
]}

机构：

[1] Univ Pittsburgh, Sch Med, Dept Biomed Informat, Pittsburgh, PA 15260 USA

[2] US Natl Inst Hlth, Lister Hill Natl Ctr Biomed Commun, Natl Lib Med, Bethesda, MD USA

来源：

PLOS ONE | 2014年 / 9卷 / 01期

基金：

美国国家卫生研究院;

关键词：

ALGORITHM; WORKLOAD; ROBUST;

D O I：

10.1371/journal.pone.0086277

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Objectives: Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance. Methods: We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric(+), indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests. Results: All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedman's test were.045,.002, and.002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric(+) features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall. Conclusions: A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration.

引用

页数：10

共 53 条

[1] Rapid diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic countries [J].

Abba, Katharine ;

Deeks, Jonathan J. ;

Olliaro, Piero ;

Naing, Cho-Min ;

Jackson, Sally M. ;

Takwoingi, Yemisi ;

Donegan, Sarah ;

Garner, Paul .

COCHRANE DATABASE OF SYSTEMATIC REVIEWS, 2011, (07)

[2]

[Anonymous], 2000, ICML, DOI DOI 10.1007/978-3-540-44871-6_130

[3]

[Anonymous], PCORI METH STAND

[4]

[Anonymous], UMLS REF MAN

[5]

[Anonymous], 2008, Introduction to information retrieval

[6]

[Anonymous], 2012, METHODOLOGICAL STAND

[7]

[Anonymous], 2007, Handbook of latent semantic analysis

[8]

[Anonymous], Mallet: A machine learning for language toolkit

[9]

[Anonymous], IBM SPSS STAT COMP P

[10] Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? [J].

Bastian, Hilda ;

Glasziou, Paul ;

Chalmers, Iain .

PLOS MEDICINE, 2010, 7 (09)

← 1 2 3 4 5 6 →