PPHOPCM: Privacy-Preserving High-Order Possibilistic c-Means Algorithm for Big Data Clustering with Cloud Computing

被引：72

作者：

Zhang, Qingchen ^{[1
,2
]}

Yang, Laurence T. ^{[1
,2
]}

Chen, Zhikui ^{[3
]}

Li, Peng ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Elect Engn, Chengdu 611731, Peoples R China

[2] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada

[3] Dalian Univ Technol, Sch Software Technol, Dalian 116023, Ganjingzi, Peoples R China

来源：

IEEE TRANSACTIONS ON BIG DATA | 2022年 / 8卷 / 01期

关键词：

Big data clustering; cloud computing; privacy preserving; possibilistic c-means; tensor space; HETEROGENEOUS DATA;

D O I：

10.1109/TBDATA.2017.2701816

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As one important technique of fuzzy clustering in data mining and pattern recognition, the possibilistic c-means algorithm (PCM) has been widely used in image analysis and knowledge discovery. However, it is difficult for PCM to produce a good result for clustering big data, especially for heterogenous data, since it is initially designed for only small structured dataset. To tackle this problem, the paper proposes a high-order PCM algorithm (HOPCM) for big data clustering by optimizing the objective function in the tensor space. Further, we design a distributed HOPCM method based on MapReduce for very large amounts of heterogeneous data. Finally, we devise a privacy-preserving HOPCM algorithm (PPHOPCM) to protect the private data on cloud by applying the BGV encryption scheme to HOPCM, In PPHOPCM, the functions for updating the membership matrix and clustering centers are approximated as polynomial functions to support the secure computing of the BGV scheme. Experimental results indicate that PPHOPCM can effectively cluster a large number of heterogeneous data using cloud computing without disclosure of private data.

引用

页码：25 / 34

页数：10

共 32 条

[1] [Anonymous], 2006, P 23 INT C MACHINE L
[2] Bekkerman R, 2006, LECT NOTES COMPUT SC, V4212, P30
[3] Bin Gao, 2005, 13th Annual ACM International Conference on Multimedia, P112
[4] Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering
Chen, Yanhua
Wang, Lijun
Dong, Ming
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) : 1459 - 1474
[5] Chua T, 2009, P ACM INT C IM VID R, P1, DOI [DOI 10.1145/2964284.2967218, DOI 10.1145/1646396.1646452]
[6] Mapreduce: Simplified data processing on large clusters
Dean, Jeffrey
Ghemawat, Sanjay
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
[7] Link prediction in heterogeneous data via generalized coupled tensor factorization
Ermis, Beyza
Acar, Evrim
Cemgil, A. Taylan
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) : 203 - 236
[8] Applying the Possibilistic c-Means Algorithm in Kernel-Induced Spaces
Filippone, Maurizio
Masulli, Francesco
Rovetta, Stefano
[J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) : 572 - 584
[9] Gu QQ, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P359
[10] Learning Image-Text Associations
Jiang, Tao
Tan, Ah-Hwee
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) : 161 - 177

← 1 2 3 4 →