Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics

被引：0

作者：

Khalique, Fatima ^{[1
]}

Khan, Shoab Ahmed ^{[2
]}

Mubarak, Qurat-ul-ain ^{[2
]}

Safdar, Hasan ^{[3
]}

机构：

[1] Natl Univ Sci & Technol, Islamabad, Pakistan

[2] NUST, Coll Elect & Mech Engn, Islamabad, Pakistan

[3] Ctr Adv Studies Engn, Islamabad, Pakistan

来源：

INTELLIGENT COMPUTING, VOL 1 | 2019年 / 858卷

关键词：

Electronic Health Record (EHR); Demographic anonymization; Duplicate detection; Patient record linking; Health data exchange; Health data privacy; Decision tree; Hashing;

D O I：

10.1007/978-3-030-01174-1_30

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Electronic Health Record (EHR) is frequently used in Health Information Exchanges for fusing data of same patients for public health informatics through the demographic attributes. Fusing this information across multiple health care entities presents a two-fold complexity. First the privacy constraints are stringent regarding sharing of demographic information across organizations. This requires encrypting or hashing records for anonymity. Second, the fusion of anonymized data leads to problem of finding duplicate records and linking the incoming information accurately to the existing records. This paper presents a methodology to acquire health data by the office of any public health department while preserving the privacy, integrity and usefulness of the data. Our novel duplicate detection algorithm is based on a combination of cryptographic hashing and machine learning techniques for approximate linking of patients' records by identifying duplicate and unique records. Experimental results on three different datasets show that our proposed methodology is capable of detecting duplicates based on encoded demographic data from EHR affectively. In addition the proposed methodology can potentially be applied for record matching in other domains with encoded data.

引用

页码：404 / 414

页数：11

共 28 条

[1] [Anonymous], 12 ACM SIGKDD INT C, DOI DOI 10.1145/1150402.1150499
[2] Adaptive name matching in information integration
Bilenko, M
Mooney, R
Cohen, W
Ravikumar, P
Fienberg, S
[J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (05) : 16 - 23
[3] The "Meaningful Use" Regulation for Electronic Health Records
Blumenthal, David
Tavenner, Marilyn
[J]. NEW ENGLAND JOURNAL OF MEDICINE, 2010, 363 (06) : 501 - 504
[4] Cormen Thomas H., 2001, Introduction to Algorithms
[5] Ektefa M., 2011, THRESHOLD BASED SIMI, P37
[6] Duplicate record detection: A survey
Elmagarmid, Ahmed K.
Ipeirotis, Panagiotis G.
Verykios, Vassilios S.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) : 1 - 16
[7] Publishing data from electronic health records while preserving privacy: A survey of algorithms
Gkoulalas-Divanis, Aris
Loukides, Grigorios
Sun, Jimeng
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 50 : 4 - 19
[8] Public Preferences About Secondary Uses of Electronic Health Information
Grande, David
Mitra, Nandita
Shah, Anand
Wan, Fei
Asch, David A.
[J]. JAMA INTERNAL MEDICINE, 2013, 173 (19) : 1798 - 1806
[9] Handschuh Helena., 2011, Encyclopedia of cryptography and security, V2nd, P1190, DOI DOI 10.1007/978-1-4419-5906-5_615
[10] Real-world data is dirty: Data cleansing and the merge/purge problem
Hernandez, MA
Stolfo, SJ
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (01) : 9 - 37

← 1 2 3 →