PPHOPCM: Privacy-Preserving High-Order Possibilistic c-Means Algorithm for Big Data Clustering with Cloud Computing

被引:72
作者
Zhang, Qingchen [1 ,2 ]
Yang, Laurence T. [1 ,2 ]
Chen, Zhikui [3 ]
Li, Peng [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Elect Engn, Chengdu 611731, Peoples R China
[2] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada
[3] Dalian Univ Technol, Sch Software Technol, Dalian 116023, Ganjingzi, Peoples R China
关键词
Big data clustering; cloud computing; privacy preserving; possibilistic c-means; tensor space; HETEROGENEOUS DATA;
D O I
10.1109/TBDATA.2017.2701816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As one important technique of fuzzy clustering in data mining and pattern recognition, the possibilistic c-means algorithm (PCM) has been widely used in image analysis and knowledge discovery. However, it is difficult for PCM to produce a good result for clustering big data, especially for heterogenous data, since it is initially designed for only small structured dataset. To tackle this problem, the paper proposes a high-order PCM algorithm (HOPCM) for big data clustering by optimizing the objective function in the tensor space. Further, we design a distributed HOPCM method based on MapReduce for very large amounts of heterogeneous data. Finally, we devise a privacy-preserving HOPCM algorithm (PPHOPCM) to protect the private data on cloud by applying the BGV encryption scheme to HOPCM, In PPHOPCM, the functions for updating the membership matrix and clustering centers are approximated as polynomial functions to support the secure computing of the BGV scheme. Experimental results indicate that PPHOPCM can effectively cluster a large number of heterogeneous data using cloud computing without disclosure of private data.
引用
收藏
页码:25 / 34
页数:10
相关论文
共 32 条
  • [1] [Anonymous], 2006, P 23 INT C MACHINE L
  • [2] Bekkerman R, 2006, LECT NOTES COMPUT SC, V4212, P30
  • [3] Bin Gao, 2005, 13th Annual ACM International Conference on Multimedia, P112
  • [4] Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering
    Chen, Yanhua
    Wang, Lijun
    Dong, Ming
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) : 1459 - 1474
  • [5] Chua T, 2009, P ACM INT C IM VID R, P1, DOI [DOI 10.1145/2964284.2967218, DOI 10.1145/1646396.1646452]
  • [6] Mapreduce: Simplified data processing on large clusters
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
  • [7] Link prediction in heterogeneous data via generalized coupled tensor factorization
    Ermis, Beyza
    Acar, Evrim
    Cemgil, A. Taylan
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (01) : 203 - 236
  • [8] Applying the Possibilistic c-Means Algorithm in Kernel-Induced Spaces
    Filippone, Maurizio
    Masulli, Francesco
    Rovetta, Stefano
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) : 572 - 584
  • [9] Gu QQ, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P359
  • [10] Learning Image-Text Associations
    Jiang, Tao
    Tan, Ah-Hwee
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) : 161 - 177