Automating data extraction in systematic reviews: A systematic review

被引：115

作者：

Jonnalagadda S.R. ^{[1
]}

Goyal P. ^{[2
]}

Huffman M.D. ^{[3
]}

机构：

[1] Northwestern University Feinberg School of Medicine, Division of Health and Biomedical Informatics, Department of Preventive Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, 60611, IL

[2] Indian Institute of Technology, Department of Computer Science and Engineering, Kharagpur, West Bengal

[3] Northwestern University Feinberg School of Medicine, Department of Preventive Medicine, Chicago

来源：

Systematic Reviews | / 4卷 / 1期

关键词：

Support Vector Machine; Data Element; Conditional Random Field; PubMed Abstract; Systematic Review Process;

D O I：

10.1186/s13643-015-0066-7

中图分类号：

学科分类号：

摘要：

Background: Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews. Methods: We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports. Results: Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %. Conclusions: We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1-7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews. © 2015 Jonnalagadda et al.

引用

共 70 条

[1] Higgins J., Green S., Cochrane handbook for systematic reviews of interventions version 5.1. 0 [updated March 2011]
[2] Khan K.S., Riet G., Glanville J., Sowden A.J., Kleijnen J., Undertaking systematic reviews of research on effectiveness: CRD's guidance for carrying out or commissioning reviews, (2001)
[3] Woolf S.H., (1996)
[4] Field M.J., Lohr K.N., Clinical practice guidelines: directions for a new program, (1990)
[5] Elliott J., Turner T., Clavisi O., Thomas J., Higgins J., Mavergames C., Et al., Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap, PLoS Med, 11, (2014)
[6] Shojania K.G., Sampson M., Ansari M.T., Ji J., Doucette S., Moher D., How quickly do systematic reviews go out of date? A survival analysis, Ann Intern Med, 147, 4, pp. 224-233, (2007)
[7] Hearst M.A., Untangling text data mining, Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp. 3-10, (1999)
[8] Morton S., Levit L., Berg A., Eden J., Finding what works in health care: standards for systematic reviews, (2011)
[9] Begg C., Cho M., Eastwood S., Horton R., Moher D., Olkin I., Et al., Improving the quality of reporting of randomized controlled trials: the CONSORT statement, JAMA, 276, 8, pp. 637-639, (1996)
[10] Bossuyt P.M., Reitsma J.B., Bruns D.E., Gatsonis C.A., Glasziou P.P., Irwig L.M., Et al., Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative, Clin Chem Lab Med, 41, 1, pp. 68-73, (2003)

← 1 2 3 4 5 6 7 →