DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text

被引:35
|
作者
Menger, Vincent [1 ]
Scheepers, Floor [2 ]
van Wijk, Lisette Maria [1 ]
Spruit, Marco [1 ]
机构
[1] Univ Utrecht, Dept Informat & Comp Sci, POB 80089, NL-3508 TB Utrecht, Netherlands
[2] Univ Med Ctr Utrecht, Dept Psychiat, POB 85500, NL-3508 GA Utrecht, Netherlands
关键词
De-identification; Dutch medical text; Pattern matching; Protected Health Information; Patient privacy; HEALTH INFORMATION; RECORDS; IMPACT;
D O I
10.1016/j.tele.2017.08.002
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons. We report on a pattern matching method to automatically de-identify medical text written in Dutch, which requires a low amount of effort to be hand tailored. First, a selection of Protected Health Information (PHI) categories is determined in cooperation with medical staff. Then, we devise a method for de-identifying all information in one of these PHI categories, that relies on lookup tables, decision rules and fuzzy string matching. Our de-identification method DEDUCE is validated on a test corpus of 200 nursing notes and 200 treatment plans obtained from the University Medical Center Utrecht (UMCU) in the Netherlands, achieving a total micro-averaged precision of 0.814, a recall of 0.916 and a F-1-score of 0.862. For person names, a recall of 0.964 was achieved, while no names of patients were missed.
引用
收藏
页码:727 / 736
页数:10
相关论文
共 50 条
  • [1] An Automatic System to Detect and Extract Text in Medical Images for De-identification
    Zhu, Yingxuan
    Singh, P. D.
    Siddiqui, Khan
    Gillam, Michael
    MEDICAL IMAGING 2010: ADVANCED PACS-BASED IMAGING INFORMATICS AND THERAPEUTIC APPLICATIONS, 2010, 7628
  • [2] Is Multiclass Automatic Text De-Identification Worth the Effort?
    Duy Duc An Bui
    Redden, David T.
    Cimino, James J.
    METHODS OF INFORMATION IN MEDICINE, 2018, 57 (04) : 177 - 184
  • [3] Leveraging text skeleton for de-identification of electronic medical records
    Yue-Shu Zhao
    Kun-Li Zhang
    Hong-Chao Ma
    Kun Li
    BMC Medical Informatics and Decision Making, 18
  • [4] Leveraging text skeleton for de-identification of electronic medical records
    Zhao, Yue-Shu
    Zhang, Kun-Li
    Ma, Hong-Chao
    Li, Kun
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2018, 18
  • [5] Automated de-identification of free-text medical records
    Ishna Neamatullah
    Margaret M Douglass
    Li-wei H Lehman
    Andrew Reisner
    Mauricio Villarroel
    William J Long
    Peter Szolovits
    George B Moody
    Roger G Mark
    Gari D Clifford
    BMC Medical Informatics and Decision Making, 8
  • [6] Automated de-identification of free-text medical records
    Neamatullah, Ishna
    Douglass, Margaret M.
    Lehman, Li-wei H.
    Reisner, Andrew
    Villarroel, Mauricio
    Long, William J.
    Szolovits, Peter
    Moody, George B.
    Mark, Roger G.
    Clifford, Gari D.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
  • [7] Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?
    Meystre, Stephane M.
    Dalianis, Hercules
    Aberdeen, John
    Malin, Brad
    MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 1242 - 1242
  • [8] Survey on RNN and CRF models for de-identification of medical free text
    Leevy, Joffrey L.
    Khoshgoftaar, Taghi M.
    Villanustre, Flavio
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [9] Survey on RNN and CRF models for de-identification of medical free text
    Joffrey L. Leevy
    Taghi M. Khoshgoftaar
    Flavio Villanustre
    Journal of Big Data, 7
  • [10] A Short Survey of LSTM Models for De-identification of Medical Free Text
    Leevy, Joffrey L.
    Khoshgoftaar, Taghi M.
    2020 IEEE 6TH INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC 2020), 2020, : 117 - 124