Privacy-Preserving Collaborative Data Collection and Analysis With Many Missing Values

被引:10
|
作者
Sei, Yuichi [1 ,2 ]
Onesimu, J. Andrew [3 ]
Okumura, Hiroshi [4 ]
Ohsuga, Akihiko [1 ]
机构
[1] Univ Electrocommun, Tokyo 1828585, Japan
[2] PRESTO, JST, Kawaguchi, Saitama 3320012, Japan
[3] Manipal Acad Higher Educ, Manipal Inst Technol, Dept Comp Sci & Engn, Manipal 576104, India
[4] Mitsubishi Res Inst, Tokyo 1008141, Japan
关键词
Data collection; Servers; Differential privacy; Data models; COVID-19; Privacy; Hospitals; differential privacy; missing values; multi-dimensional analysis; privacy-preserving data collection; MEMBERSHIP INFERENCE ATTACKS; VALUE IMPUTATION; COPULAS; NOISE;
D O I
10.1109/TDSC.2022.3174887
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy-preserving data mining techniques are useful for analyzing various information, such as Internet of Things data and COVID-19-related patient data. However, collecting a large amount of sensitive personal information is a challenging task. In addition, this information may have missing values, which are not considered in the existing methods for collecting personal information while ensuring data privacy. Failure to account for missing values reduces the accuracy of the data analysis. In this article, we propose a method for privacy-preserving data collection that considers many missing values. The patient data are anonymized and sent to a data collection server. The data collection server creates a generative model and a contingency table suitable for multi-attribute analysis based on expectation-maximization and Gaussian copula methods. Using differential privacy (the de facto standard) as a privacy metric, we conduct experiments on synthetic and real data, including COVID-19-related data. The results are 50-80% more accurate than those of existing methods that do not consider missing values.
引用
收藏
页码:2158 / 2173
页数:16
相关论文
共 50 条
  • [21] Privacy-preserving data collection for 1: M dataset
    M. Abrar
    B. Zuhaira
    A. Anjum
    Multimedia Tools and Applications, 2021, 80 : 31335 - 31356
  • [22] Privacy-preserving collaborative filtering
    Polat, H
    Du, WL
    INTERNATIONAL JOURNAL OF ELECTRONIC COMMERCE, 2005, 9 (04) : 9 - 35
  • [23] A privacy-preserving data collection model for digital community
    LI HongTao
    MA JianFeng
    FU Shuai
    ScienceChina(InformationSciences), 2015, 58 (03) : 36 - 51
  • [24] Privacy-preserving data collection for 1: M dataset
    Abrar, M.
    Zuhaira, B.
    Anjum, A.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (20) : 31335 - 31356
  • [25] Towards Accurate Truth Discovery With Privacy-Preserving Over Crowdsourced Data Streams
    Gong, Zhimao
    Yang, Zhibang
    Yang, Shenghong
    Yu, Siyang
    Li, Kenli
    Duan, Mingxing
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 2155 - 2168
  • [26] Privacy Integrated Queries An Extensible Platform for Privacy-Preserving Data Analysis
    McSherry, Frank
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 19 - 30
  • [27] Privacy-preserving collaborative social network data publishing against colluding data providers
    Kadhiwala, Bintu
    Patel, Sankita J.
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2022, 19 (3-4) : 346 - 378
  • [28] Secret Specification Based Personalized Privacy-Preserving Analysis in Big Data
    Chen, Jiajun
    Hu, Chunqiang
    Liu, Zewei
    Xiang, Tao
    Hu, Pengfei
    Yu, Jiguo
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (02) : 774 - 787
  • [29] DP-FedCMRS: Privacy-Preserving Federated Learning Algorithm to Solve Heterogeneous Data
    Zhang, Yang
    Long, Shigong
    Liu, Guangyuan
    Zhang, Junming
    IEEE ACCESS, 2025, 13 : 41984 - 41993
  • [30] Privacy-preserving hybrid collaborative filtering on cross distributed data
    Ibrahim Yakut
    Huseyin Polat
    Knowledge and Information Systems, 2012, 30 : 405 - 433