DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text

被引:35
|
作者
Menger, Vincent [1 ]
Scheepers, Floor [2 ]
van Wijk, Lisette Maria [1 ]
Spruit, Marco [1 ]
机构
[1] Univ Utrecht, Dept Informat & Comp Sci, POB 80089, NL-3508 TB Utrecht, Netherlands
[2] Univ Med Ctr Utrecht, Dept Psychiat, POB 85500, NL-3508 GA Utrecht, Netherlands
关键词
De-identification; Dutch medical text; Pattern matching; Protected Health Information; Patient privacy; HEALTH INFORMATION; RECORDS; IMPACT;
D O I
10.1016/j.tele.2017.08.002
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons. We report on a pattern matching method to automatically de-identify medical text written in Dutch, which requires a low amount of effort to be hand tailored. First, a selection of Protected Health Information (PHI) categories is determined in cooperation with medical staff. Then, we devise a method for de-identifying all information in one of these PHI categories, that relies on lookup tables, decision rules and fuzzy string matching. Our de-identification method DEDUCE is validated on a test corpus of 200 nursing notes and 200 treatment plans obtained from the University Medical Center Utrecht (UMCU) in the Netherlands, achieving a total micro-averaged precision of 0.814, a recall of 0.916 and a F-1-score of 0.862. For person names, a recall of 0.964 was achieved, while no names of patients were missed.
引用
收藏
页码:727 / 736
页数:10
相关论文
共 50 条
  • [21] Evaluating the state-of-the-art in automatic de-identification
    Uzuner, Oezlem
    Luo, Yuan
    Szolovits, Peter
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (05) : 550 - 563
  • [22] Automatic De-Identification of Medical Records with a Multilevel Hybrid Semi-Supervised Learning Approach
    Nguyen Dong Phuong
    Vo Thi Ngoc Chau
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 43 - 48
  • [23] De-identification algorithm for free-text nursing notes
    Douglass, MM
    Cliffford, GD
    Reisner, A
    Long, WJ
    Moody, GB
    Mark, RG
    COMPUTERS IN CARDIOLOGY 2005, VOL 32, 2005, 32 : 331 - 334
  • [24] De-identification of Clinical Text for Secondary Use: Research Issues
    Berg, Hanna
    Henriksson, Aron
    Fors, Uno
    Dalianis, Hercules
    HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 592 - 599
  • [25] Effects of personal identifier resynthesis on clinical text de-identification
    Yeniterzi, Reyyan
    Aberdeen, John
    Bayer, Samuel
    Wellner, Ben
    Hirschman, Lynette
    Malin, Bradley
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (02) : 159 - 168
  • [26] An open source toolkit for medical imaging de-identification
    Rodriguez Gonzalez, David
    Carpenter, Trevor
    van Hemert, Jano I.
    Wardlaw, Joanna
    EUROPEAN RADIOLOGY, 2010, 20 (08) : 1896 - 1904
  • [27] Medical Image De-Identification using Cloud Services
    Kopchick, B.
    Klenk, J.
    Carlson, T.
    Kumpatla, M.
    Klimov, S.
    Mikdadi, D.
    Pan, Q.
    Gustafson, S.
    Kaltman, J.
    Wagner, U.
    Clunie, D.
    Farahani, K.
    MEDICAL IMAGING 2022: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2022, 12037
  • [28] An open source toolkit for medical imaging de-identification
    David Rodríguez González
    Trevor Carpenter
    Jano I. van Hemert
    Joanna Wardlaw
    European Radiology, 2010, 20 : 1896 - 1904
  • [29] A DICOM dataset for evaluation of medical image de-identification
    Michael Rutherford
    Seong K. Mun
    Betty Levine
    William Bennett
    Kirk Smith
    Phil Farmer
    Quasar Jarosz
    Ulrike Wagner
    John Freyman
    Geri Blake
    Lawrence Tarbox
    Keyvan Farahani
    Fred Prior
    Scientific Data, 8
  • [30] A DICOM dataset for evaluation of medical image de-identification
    Rutherford, Michael
    Mun, Seong K.
    Levine, Betty
    Bennett, William
    Smith, Kirk
    Farmer, Phil
    Jarosz, Quasar
    Wagner, Ulrike
    Freyman, John
    Blake, Geri
    Tarbox, Lawrence
    Farahani, Keyvan
    Prior, Fred
    SCIENTIFIC DATA, 2021, 8 (01)