Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data

被引:26
作者
Packhaeuser, Kai [1 ]
Guendel, Sebastian [1 ]
Muenster, Nicolas [1 ]
Syben, Christopher [1 ]
Christlein, Vincent [1 ]
Maier, Andreas [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Dept Comp Sci, Pattern Recognit Lab, D-91058 Erlangen, Germany
关键词
PRIVACY;
D O I
10.1038/s41598-022-19045-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical datasets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Furthermore, we achieve an AUC of up to 0.9870 and a precision@1 of up to 0.9444 when evaluating our trained networks on external datasets such as CheXpert and the COVID-19 Image Data Collection. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.
引用
收藏
页数:13
相关论文
共 57 条
  • [1] A Region Based Convolutional Network for Tumor Detection and Classification in Breast Mammography
    Akselrod-Ballin, Ayelet
    Karlinsky, Leonid
    Alpert, Sharon
    Hasoul, Sharbell
    Ben-Ari, Rami
    Barkan, Ella
    [J]. DEEP LEARNING AND DATA LABELING FOR MEDICAL APPLICATIONS, 2016, 10008 : 197 - 205
  • [2] COVID-19 Pandemic: Cardiovascular Complications and Future Implications
    Bandyopadhyay, Dhrubajyoti
    Akhtar, Tauseef
    Hajra, Adrija
    Gupta, Manasvi
    Das, Avash
    Chakraborty, Sandipan
    Pal, Ipsita
    Patel, Neelkumar
    Amgai, Birendra
    Ghosh, Raktim K.
    Fonarow, Gregg C.
    Lavie, Carl J.
    Naidu, Srihari S.
    [J]. AMERICAN JOURNAL OF CARDIOVASCULAR DRUGS, 2020, 20 (04) : 311 - 324
  • [3] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
  • [4] Centers for Disease Control and Prevention, 2018, Health insurance portability and accountability act of 1996 (HIPPA)
  • [5] Chowdhury M.E.H., 2020, COVID-19 Radiography Database
  • [6] Chung A, 2020, FIGURE 1 COVID 19 CH
  • [7] Chung A., 2020, Actualmed COVID-19 chest x-ray dataset
  • [8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [9] The Algorithmic Foundations of Differential Privacy
    Dwork, Cynthia
    Roth, Aaron
    [J]. FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2013, 9 (3-4): : 211 - 406
  • [10] A Firm Foundation for Private Data Analysis
    Dwork, Cynthia
    [J]. COMMUNICATIONS OF THE ACM, 2011, 54 (01) : 86 - 95