Privacy-Preserving Collaborative Data Collection and Analysis With Many Missing Values

被引:10
|
作者
Sei, Yuichi [1 ,2 ]
Onesimu, J. Andrew [3 ]
Okumura, Hiroshi [4 ]
Ohsuga, Akihiko [1 ]
机构
[1] Univ Electrocommun, Tokyo 1828585, Japan
[2] PRESTO, JST, Kawaguchi, Saitama 3320012, Japan
[3] Manipal Acad Higher Educ, Manipal Inst Technol, Dept Comp Sci & Engn, Manipal 576104, India
[4] Mitsubishi Res Inst, Tokyo 1008141, Japan
关键词
Data collection; Servers; Differential privacy; Data models; COVID-19; Privacy; Hospitals; differential privacy; missing values; multi-dimensional analysis; privacy-preserving data collection; MEMBERSHIP INFERENCE ATTACKS; VALUE IMPUTATION; COPULAS; NOISE;
D O I
10.1109/TDSC.2022.3174887
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy-preserving data mining techniques are useful for analyzing various information, such as Internet of Things data and COVID-19-related patient data. However, collecting a large amount of sensitive personal information is a challenging task. In addition, this information may have missing values, which are not considered in the existing methods for collecting personal information while ensuring data privacy. Failure to account for missing values reduces the accuracy of the data analysis. In this article, we propose a method for privacy-preserving data collection that considers many missing values. The patient data are anonymized and sent to a data collection server. The data collection server creates a generative model and a contingency table suitable for multi-attribute analysis based on expectation-maximization and Gaussian copula methods. Using differential privacy (the de facto standard) as a privacy metric, we conduct experiments on synthetic and real data, including COVID-19-related data. The results are 50-80% more accurate than those of existing methods that do not consider missing values.
引用
收藏
页码:2158 / 2173
页数:16
相关论文
共 50 条
  • [1] Privacy-Preserving SRS Data Anonymization by Incorporating Missing Values
    Lin, Wen-Yang
    Hsu, Kuang-Yung
    Shen, Zih-Xun
    2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 106 - 109
  • [2] Privacy-preserving imputation of missing data
    Jagannathan, Geetha
    Wright, Rebecca N.
    DATA & KNOWLEDGE ENGINEERING, 2008, 65 (01) : 40 - 56
  • [3] Privacy-preserving collaborative data mining
    Zhan, J
    Chang, LW
    Matwin, S
    FOUNDATIONS AND NOVEL APPROACHES IN DATA MINING, 2006, 9 : 213 - +
  • [4] PRIVACY-PRESERVING COLLABORATIVE DATA MINING
    Zhan, Justin
    KMIS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE MANAGEMENT AND INFORMATION SHARING, 2009, : IS15 - IS15
  • [5] PRIVACY-PRESERVING COLLABORATIVE DATA MINING
    Zhan, Justin
    KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : IS15 - IS15
  • [6] Privacy-preserving collaborative data mining
    Zhan, Justin
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2008, 3 (02) : 31 - 41
  • [7] Interval Privacy: A Framework for Privacy-Preserving Data Collection
    Ding, Jie
    Ding, Bangjun
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 2443 - 2459
  • [8] Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis
    Ma, Jing
    Zhang, Qiuchen
    Lou, Jian
    Ho, Joyce C.
    Xiong, Li
    Jiang, Xiaoqian
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1291 - 1300
  • [9] Socially Privacy-Preserving Data Collection for Crowdsensing
    Yang, Guang
    He, Shibo
    Zhang, Junshan
    Shi, Zhiguo
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (01) : 851 - 861
  • [10] Collaborative, Privacy-Preserving Data Aggregation at Scale
    Applebaum, Benny
    Ringberg, Haakon
    Freedman, Michael J.
    Caesar, Matthew
    Rexford, Jennifer
    PRIVACY ENHANCING TECHNOLOGIES, 2010, 6205 : 56 - +