Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics

被引:0
作者
Khalique, Fatima [1 ]
Khan, Shoab Ahmed [2 ]
Mubarak, Qurat-ul-ain [2 ]
Safdar, Hasan [3 ]
机构
[1] Natl Univ Sci & Technol, Islamabad, Pakistan
[2] NUST, Coll Elect & Mech Engn, Islamabad, Pakistan
[3] Ctr Adv Studies Engn, Islamabad, Pakistan
来源
INTELLIGENT COMPUTING, VOL 1 | 2019年 / 858卷
关键词
Electronic Health Record (EHR); Demographic anonymization; Duplicate detection; Patient record linking; Health data exchange; Health data privacy; Decision tree; Hashing;
D O I
10.1007/978-3-030-01174-1_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic Health Record (EHR) is frequently used in Health Information Exchanges for fusing data of same patients for public health informatics through the demographic attributes. Fusing this information across multiple health care entities presents a two-fold complexity. First the privacy constraints are stringent regarding sharing of demographic information across organizations. This requires encrypting or hashing records for anonymity. Second, the fusion of anonymized data leads to problem of finding duplicate records and linking the incoming information accurately to the existing records. This paper presents a methodology to acquire health data by the office of any public health department while preserving the privacy, integrity and usefulness of the data. Our novel duplicate detection algorithm is based on a combination of cryptographic hashing and machine learning techniques for approximate linking of patients' records by identifying duplicate and unique records. Experimental results on three different datasets show that our proposed methodology is capable of detecting duplicates based on encoded demographic data from EHR affectively. In addition the proposed methodology can potentially be applied for record matching in other domains with encoded data.
引用
收藏
页码:404 / 414
页数:11
相关论文
共 28 条
  • [1] [Anonymous], 12 ACM SIGKDD INT C, DOI DOI 10.1145/1150402.1150499
  • [2] Adaptive name matching in information integration
    Bilenko, M
    Mooney, R
    Cohen, W
    Ravikumar, P
    Fienberg, S
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (05) : 16 - 23
  • [3] The "Meaningful Use" Regulation for Electronic Health Records
    Blumenthal, David
    Tavenner, Marilyn
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2010, 363 (06) : 501 - 504
  • [4] Cormen Thomas H., 2001, Introduction to Algorithms
  • [5] Ektefa M., 2011, THRESHOLD BASED SIMI, P37
  • [6] Duplicate record detection: A survey
    Elmagarmid, Ahmed K.
    Ipeirotis, Panagiotis G.
    Verykios, Vassilios S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) : 1 - 16
  • [7] Publishing data from electronic health records while preserving privacy: A survey of algorithms
    Gkoulalas-Divanis, Aris
    Loukides, Grigorios
    Sun, Jimeng
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 50 : 4 - 19
  • [8] Public Preferences About Secondary Uses of Electronic Health Information
    Grande, David
    Mitra, Nandita
    Shah, Anand
    Wan, Fei
    Asch, David A.
    [J]. JAMA INTERNAL MEDICINE, 2013, 173 (19) : 1798 - 1806
  • [9] Handschuh Helena., 2011, Encyclopedia of cryptography and security, V2nd, P1190, DOI DOI 10.1007/978-1-4419-5906-5_615
  • [10] Real-world data is dirty: Data cleansing and the merge/purge problem
    Hernandez, MA
    Stolfo, SJ
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (01) : 9 - 37