N-Sanitization: A semantic privacy-preserving framework for unstructured medical datasets

被引:60
作者
Iwendi, Celestine [1 ,2 ]
Moqurrab, Syed Atif [3 ,4 ]
Anjum, Adeel [4 ]
Khan, Sangeen [4 ]
Mohan, Senthilkumar [5 ]
Srivastava, Gautam [6 ,7 ]
机构
[1] Bcc Cent South Univ Forestry & Technol, Changsha 410004, Peoples R China
[2] Coal City Univ Enugu, Dept Math & Comp Sci, Enugu 400231, Nigeria
[3] Air Univ Islamabad, Islamabad 44000, Pakistan
[4] Comsats Inst Informat Technol, Islamabad 45550, Pakistan
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore 632014, Tamil Nadu, India
[6] Brandon Univ, Dept Math & Comp Sci, Brandon, MB R7A 6A9, Canada
[7] China Med Univ, Res Ctr Interneural Comp, Taichung 40402, Taiwan
关键词
Anonymization; Document sanitization; Textual-privacy; Negated assertion; Medical data; IoMT; DE-IDENTIFICATION; ANONYMIZATION; NETWORK; SYSTEM; MODEL;
D O I
10.1016/j.comcom.2020.07.032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The introduction and rapid growth of the Internet of Medical Things (IoMT), a subset of the Internet of Things (IoT) in the medical and healthcare systems, has brought numerous changes and challenges to current medical and healthcare systems. Healthcare organizations share data about patients with research organizations for various medical discoveries. Releasing such information is a tedious task since it puts the privacy of patients at risk with the understanding that textual health documents about an individual contains specific sensitive terms that need to be sanitized before such document can be released. Recent approaches improved the utility of protected output by substituting sensitive terms with appropriate "generalizations'' that are retrieved from several medical and general-purpose knowledge bases (KBs). However, these approaches perform unnecessary sanitization by anonymizing the negated assertions, e.g., AIDS-negative. This paper proposes a semantic privacy framework that effectively sanitizes the sensitive and semantically related terms in healthcare documents. The proposed model effectively identifies the negated assertions (e.g., AIDS-negative) before the sanitization process in IoMT which further improves the utility of sanitized documents. Moreover, besides considering the sensitive medical findings, we also incorporated state-of-the-art metrics, i.e., Protected Health Information (PHI), as defined in the privacy rules such as Health Insurance Portability and Accountability Act (HIPAA), Informatics for Integrating Biology & the Bedside (i2b2), and Materialize Interactive Medical Image Control System (MIMICS). The proposed approach is evaluated on real clinical data provided by i2b2. On average the detection (for both PHI's and medical findings) accuracy is improved with Precision, Recall and F-measure score at 21%, 51%, and 54% respectively. The overall improved data utility of our proposed model is 8% as compared to C-sanitized and 25% when comparing it with a simple reduction approach. Experimental results show that our approach effectively manages the privacy and utility trade-off as compared to its counterparts.
引用
收藏
页码:160 / 171
页数:12
相关论文
共 55 条
  • [1] [Anonymous], 2014, Journal of Privacy and Confidentiality, DOI DOI 10.29012/JPC.V6I1.634
  • [2] [Anonymous], 2019, ONLINE J PUBLIC HLTH
  • [3] Batet M., 2014, P 2014 IEEE NETWORK, P1
  • [4] Development and evaluation of an open source software tool for deidentification of pathology reports
    Beckwith B.A.
    Mahaadevan R.
    Balis U.J.
    Kuo F.
    [J]. BMC Medical Informatics and Decision Making, 6 (1)
  • [5] Benson T., 2012, Principles of health interoperability HL7 and SNOMED
  • [6] The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge
    Bui, Duy Duc An
    Wyatt, Mathew
    Cimino, James J.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 75 : S54 - S61
  • [7] Cecil J., 2018, Informatics in Medicine Unlocked, V12, P128, DOI 10.1016/j.imu.2018.05.002
  • [8] Chapman WW, 2001, J AM MED INFORM ASSN, P105
  • [9] Chester S., 2011, ADBIS, V2, P107
  • [10] Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes
    Chester, Sean
    Kapron, Bruce M.
    Ramesh, Ganesh
    Srivastava, Gautam
    Thomo, Alex
    Venkatesh, S.
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2013, 3 (03) : 381 - 399