Child Health Dataset Publishing and Mining Based on Differential Privacy Preservation

被引:1
|
作者
Li, Wenyu [1 ]
Wang, Siqi [1 ]
Wang, Hongwei [2 ]
Lu, Yunlong [1 ]
机构
[1] Beihua Univ, Sch Math & Stat, Jilin 132013, Peoples R China
[2] Texas A&M Int Univ, Dept Math & Phys, Laredo, TX 78045 USA
关键词
children health; synthetic data release; differential privacy; marginal histograms; logistic regression;
D O I
10.3390/math12162487
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
With the emergence and development of application requirements such as data analysis and publishing, it is particularly important to use differential privacy protection technology to provide more reliable, secure, and compliant datasets for research in the field of children's health. This paper focuses on the differential privacy protection of the ultrasound examination health dataset of adolescents in southern Texas from three aspects: differential privacy protection with output perturbation on basic statistics, publication of differential privacy marginal histogram and synthesized data, and a machine learning differential privacy learning algorithm. Firstly, differential privacy protection results with output perturbation show that Laplace and Gaussian mechanisms for numerical data, as well as the exponential mechanism for non-numerical data, can achieve the goal of protecting privacy. The exponential mechanism provides higher privacy protection. Secondly, a differential privacy marginal histogram with four attributes can be obtained with an appropriate privacy budget that approximates the marginal histogram of the original data. In order to publish synthetic data, we construct a synthetic query to obtain the corresponding differential privacy histogram for two attributes. Further, a synthetic dataset can be constructed by following the data distribution of the original dataset and the quality of the synthetic data publication can also be evaluated by the mean square error and error rate. Finally, consider a differential privacy logistic regression model under machine learning to predict whether children have fatty liver in binary classification tasks. The experimental results show that the model combined with quadratic perturbation has better accuracy and privacy protection. This paper can provide differential privacy protection models under different demands, which provides important data release and analysis options for data managers and research organizations, in addition to enriching the research on child health data releasing and mining.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] TIDY: Publishing a Time Interval Dataset With Differential Privacy
    Jung, Woohwan
    Kwon, Suyong
    Shim, Kyuseok
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (05) : 2280 - 2294
  • [2] TIDY: Publishing a Time Interval Dataset with Differential Privacy
    Jung, Woohwan
    Kwon, Suyong
    Shim, Kyuseok
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 2020 - 2021
  • [3] Trajectory time prediction and dataset publishing mechanism based on deep learning and differential privacy
    Li, Dongping
    Shen, Shikai
    Yang, Yingchun
    He, Jun
    Shen, Haoru
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 783 - 795
  • [4] A Data Publishing System Based on Privacy Preservation
    Wang, Zhihui
    Zhu, Yun
    Zhou, Xuchen
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 553 - 556
  • [5] Privacy Preservation for Trajectory Publication Based on Differential Privacy
    Yao, Lin
    Chen, Zhenyu
    Hu, Haibo
    Wu, Guowei
    Wu, Bin
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (03)
  • [6] A Toll Data Publishing Method using Encryption and Differential Privacy Preservation Technology
    Shen, Lijun
    Su, Peng
    Lu, Xiaoyu
    Wang, Xiao
    Liu, Yifei
    Ouyang, Hai
    PROCEEDINGS OF 2017 IEEE 2ND INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2017, : 1586 - 1594
  • [7] A Data Publishing Method for Trajectory Privacy Classification Based on Differential Privacy
    He, Qian
    Liao, Bingjie
    Liu, Peng
    Dong, Qinghe
    FRONTIERS OF NETWORKING TECHNOLOGIES, CCF CHINANET 2023, 2024, 1988 : 74 - 83
  • [8] Homogeneous network publishing privacy protection based on differential privacy uncertainty
    Qu, Lianwei
    Yang, Jing
    Wang, Yong
    INFORMATION SCIENCES, 2023, 636
  • [9] Personalized trajectory privacy data publishing scheme based on differential privacy
    Liu, Peiqian
    Wu, Duoduo
    Shen, Zihao
    Wang, Hui
    Liu, Kun
    INTERNET OF THINGS, 2024, 25
  • [10] Graph publishing method based on differential privacy protection
    王俊丽
    Yang Li
    Wu Yuxi
    Guan Min
    High Technology Letters, 2018, 24 (02) : 134 - 141