Ensemble Approaches to Recognize Protected Health Information in Radiology Reports

被引:2
作者
Horng, Hannah [1 ]
Steinkamp, Jackson [2 ]
Kahn, Charles E., Jr. [2 ,3 ]
Cook, Tessa S. [2 ]
机构
[1] Univ Penn, Dept Bioengn, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Radiol, Philadelphia, PA 19104 USA
[3] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
关键词
Natural language processing; De-identification; Protected health information (PHI); Reporting; Machine learning; Ensemble models; DE-IDENTIFICATION;
D O I
10.1007/s10278-022-00673-0
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Natural language processing (NLP) techniques for electronic health records have shown great potential to improve the quality of medical care. The text of radiology reports frequently constitutes a large fraction of EHR data, and can provide valuable information about patients' diagnoses, medical history, and imaging findings. The lack of a major public repository for radiological reports severely limits the development, testing, and application of new NLP tools. De-identification of protected health information (PHI) presents a major challenge to building such repositories, as many automated tools for de-identification were trained or designed for clinical notes and do not perform sufficiently well to build a public database of radiology reports. We developed and evaluated six ensemble models based on three publically available de-identification tools: MIT de-id, NeuroNER, and Philter. A set of 1023 reports was set aside as the testing partition. Two individuals with medical training annotated the test set for PHI; differences were resolved by consensus. Ensemble methods included simple voting schemes (1-Vote, 2-Votes, and 3-Votes), a decision tree, a naive Bayesian classifier, and Adaboost boosting. The 1-Vote ensemble achieved recall of 998 / 1043 (95.7%); the 3-Votes ensemble had precision of 1035 / 1043 (99.2%). F1 scores were: 93.4% for the decision tree, 71.2% for the naive Bayesian classifier, and 87.5% for the boosting method. Basic voting algorithms and machine learning classifiers incorporating the predictions of multiple tools can outperform each tool acting alone in de-identifying radiology reports. Ensemble methods hold substantial potential to improve automated de-identification tools for radiology reports to make such reports more available for research use to improve patient care and outcomes.
引用
收藏
页码:1694 / 1698
页数:5
相关论文
共 14 条
[1]  
[Anonymous], 2015, Guidance regarding methods for deidentification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) privacy rule
[2]   The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository [J].
Clark, Kenneth ;
Vendt, Bruce ;
Smith, Kirk ;
Freymann, John ;
Kirby, Justin ;
Koppel, Paul ;
Moore, Stephen ;
Phillips, Stanley ;
Maffitt, David ;
Pringle, Michael ;
Tarbox, Lawrence ;
Prior, Fred .
JOURNAL OF DIGITAL IMAGING, 2013, 26 (06) :1045-1057
[3]   De-identification of patient notes with recurrent neural networks [J].
Dernoncourt, Franck ;
Lee, Ji Young ;
Uzuner, Ozlem ;
Szolovits, Peter .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (03) :596-606
[4]  
European Union, 2016, DIRECTIVE EU 2016680
[5]   BoB, a best-of-breed automated text de-identification system for VHA clinical documents [J].
Ferrandez, Oscar ;
South, Brett R. ;
Shen, Shuying ;
Friedlin, F. Jeffrey ;
Samore, Matthew H. ;
Meystre, Stephane M. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (01) :77-83
[6]  
Lee Hee-Jin, 2017, AMIA Annu Symp Proc, V2017, P1070
[7]   Review of Natural Language Processing in Radiology [J].
Luo, Jack W. ;
Chong, Jaron J. R. .
NEUROIMAGING CLINICS OF NORTH AMERICA, 2020, 30 (04) :447-+
[8]   Automated de-identification of free-text medical records [J].
Neamatullah, Ishna ;
Douglass, Margaret M. ;
Lehman, Li-wei H. ;
Reisner, Andrew ;
Villarroel, Mauricio ;
Long, William J. ;
Szolovits, Peter ;
Moody, George B. ;
Mark, Roger G. ;
Clifford, Gari D. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
[9]   Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes [J].
Norgeot, Beau ;
Muenzen, Kathleen ;
Peterson, Thomas A. ;
Fan, Xuancheng ;
Glicksberg, Benjamin S. ;
Schenk, Gundolf ;
Rutenberg, Eugenia ;
Oskotsky, Boris ;
Sirota, Marina ;
Yazdany, Jinoos ;
Schmajuk, Gabriela ;
Ludwig, Dana ;
Goldstein, Theodore ;
Butte, Atul J. .
NPJ DIGITAL MEDICINE, 2020, 3 (01)
[10]   Natural Language Processing in Radiology: A Systematic Review [J].
Pons, Ewoud ;
Braun, Loes M. M. ;
Hunink, M. G. Myriam ;
Kors, Jan A. .
RADIOLOGY, 2016, 279 (02) :329-343