Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values

被引:12
作者
Lee, Junhak [1 ]
Jeong, Jinwoo [1 ]
Jung, Sungji [1 ]
Moon, Jihoon [1 ]
Rho, Seungmin [1 ]
机构
[1] Chung Ang Univ, Dept Ind Secur, Seoul 06974, South Korea
基金
新加坡国家研究基金会;
关键词
de-identification; medical data; machine learning; tree-based method; explainable artificial intelligence; ARTIFICIAL-INTELLIGENCE; BIG DATA; PRIVACY PROTECTION; NEURAL-NETWORK; USABILITY; AGE;
D O I
10.3390/jpm12020190
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a verification of de-identification techniques for personal healthcare information by considering data confidentiality and usability. Data are generated and preprocessed by considering the actual statistical data, personal information datasets, and de-identification datasets based on medical data to represent the de-identification technique as a numeric dataset. Five tree-based regression models (i.e., decision tree, random forest, gradient boosting machine, extreme gradient boosting, and light gradient boosting machine) are constructed using the de-identification dataset to effectively discover nonlinear relationships between dependent and independent variables in numerical datasets. Then, the most effective model is selected from personal information data in which pseudonym processing is essential for data utilization. The Shapley additive explanation, an explainable artificial intelligence technique, is applied to the most effective model to establish pseudonym processing policies and machine learning to present a machine-learning process that selects an appropriate de-identification methodology.
引用
收藏
页数:19
相关论文
共 63 条
[31]   How Do Machines Learn? Artificial Intelligence as a New Era in Medicine [J].
Koteluk, Oliwia ;
Wartecki, Adrian ;
Mazurek, Sylwia ;
Kolodziejczak, Iga ;
Mackiewicz, Andrzej .
JOURNAL OF PERSONALIZED MEDICINE, 2021, 11 (01) :1-22
[32]   RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records [J].
Kwon, Bum Chul ;
Choi, Min-Je ;
Kim, Joanne Taery ;
Choi, Edward ;
Kim, Young Bin ;
Kwon, Soonwook ;
Sun, Jimeng ;
Choo, Jaegul .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (01) :299-309
[33]   Personal data privacy protection in an age of globalization: the US-EU safe harbor compromise [J].
Long, WJ ;
Quek, MP .
JOURNAL OF EUROPEAN PUBLIC POLICY, 2002, 9 (03) :325-344
[34]   From local explanations to global understanding with explainable AI for trees [J].
Lundberg, Scott M. ;
Erion, Gabriel ;
Chen, Hugh ;
DeGrave, Alex ;
Prutkin, Jordan M. ;
Nair, Bala ;
Katz, Ronit ;
Himmelfarb, Jonathan ;
Bansal, Nisha ;
Lee, Su-In .
NATURE MACHINE INTELLIGENCE, 2020, 2 (01) :56-67
[35]   Automatic de-identification of textual documents in the electronic health record: a review of recent research [J].
Meystre, Stephane M. ;
Friedlin, F. Jeffrey ;
South, Brett R. ;
Shen, Shuying ;
Samore, Matthew H. .
BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10
[36]   Solving the Cold-Start Problem in Short-Term Load Forecasting Using Tree-Based Methods [J].
Moon, Jihoon ;
Kim, Junhong ;
Kang, Pilsung ;
Hwang, Eenjun .
ENERGIES, 2020, 13 (04)
[37]   A Short-Term Electric Load Forecasting Scheme Using 2-Stage Predictive Analytics [J].
Moon, Jihoon ;
Kim, Kyu-Hyung ;
Kim, Yongsung ;
Hwang, Eenjun .
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, :219-226
[38]   Gradient boosting machines, a tutorial [J].
Natekin, Alexey ;
Knoll, Alois .
FRONTIERS IN NEUROROBOTICS, 2013, 7
[39]   Automated de-identification of free-text medical records [J].
Neamatullah, Ishna ;
Douglass, Margaret M. ;
Lehman, Li-wei H. ;
Reisner, Andrew ;
Villarroel, Mauricio ;
Long, William J. ;
Szolovits, Peter ;
Moody, George B. ;
Mark, Roger G. ;
Clifford, Gari D. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
[40]   Multistep-Ahead Solar Radiation Forecasting Scheme Based on the Light Gradient Boosting Machine: A Case Study of Jeju Island [J].
Park, Jinwoong ;
Moon, Jihoon ;
Jung, Seungmin ;
Hwang, Eenjun .
REMOTE SENSING, 2020, 12 (14)