A Novel Scalable Kernelized Fuzzy Clustering Algorithms Based on In-Memory Computation for Handling Big Data

被引:12
作者
Jha, Preeti [1 ]
Tiwari, Aruna [1 ]
Bharill, Neha [2 ]
Ratnaparkhe, Milind [3 ]
Mounika, Mukkamalla [1 ]
Nagendra, Neha [1 ]
机构
[1] Indian Inst Technol, Indore 453552, India
[2] Mahindra Univ, Ecole Cent Sch Engn, Hyderabad 500043, India
[3] ICAR Indian Inst Soybean Res, Indore 452001, India
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2021年 / 5卷 / 06期
关键词
Clustering algorithms; Kernel; Big Data; Optimization; Kernelized clustering algorithms; Nonlinear separable; in-memory computation;
D O I
10.1109/TETCI.2020.3016302
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional scalable clustering algorithms mainly deal with the clustering of linearly separable data, but it is challenging to cluster the non-linear separable data efficiently in the feature space. In this article, we propose a novel Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) clustering algorithm using Big Data framework. To propose the KSRSIO-FCM, we also propose the Kernelized version of Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithm, which is an integral part of the proposed KSRSIO-FCM algorithm. These kernelized clustering algorithms are evolved to deal with the non-linear separable problems by applying a kernel Radial Basis Functions (RBF) which maps the input data space non-linearly into a high dimensional feature space. We aim to design and implement the kernelized fuzzy clustering algorithms on Apache Spark, which performs the efficient clustering of Big Data due to its in-memory cluster computing technique. Exhaustive experiments are performed on various big datasets to show the effectiveness of proposed KSRSIO-FCM in comparison with other scalable clustering algorithms, i.e., KSLFCM, SRSIO-FCM, and SLFCM. The reported experimental results show that the KSRSIO-FCM algorithm in comparison with KSLFCM, SRSIO-FCM, and SLFCM achieves significant improvement in terms of time and space complexity, Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F-score, respectively. Furthermore, we have carried out a performance analysis of KSRSIO-FCM versus KSLFCM. Thus, the reported results show that the KSRSIO-FCM implemented on Apache Spark has better potential for Big Data clustering as compared to traditional scalable fuzzy clustering methods.
引用
收藏
页码:908 / 919
页数:12
相关论文
共 35 条
[1]  
Baili N, 2012, IEEE INT CONF FUZZY
[2]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[3]   Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark [J].
Bharill, Neha ;
Tiwari, Aruna ;
Malviya, Aayushi .
IEEE Transactions on Big Data, 2016, 2 (04) :339-352
[4]  
Bharill N., 2014, Advance Trends in Soft Computing, P219
[5]   Fuzzy knowledge based performance analysis on big data [J].
Bharill, Neha ;
Tiwari, Aruna ;
Malviya, Aayushi ;
Patel, Om Prakash ;
Gupta, Akahansh ;
Puthal, Deepak ;
Saxena, Amit ;
Prasad, Mukesh .
NEUROCOMPUTING, 2020, 389 :218-228
[6]   Robust fuzzy relational classifier incorporating the soft class labels [J].
Cai, Weiling ;
Chen, Songcan ;
Zhang, Daoqiang .
PATTERN RECOGNITION LETTERS, 2007, 28 (16) :2250-2263
[7]   The Mobile App Usability Inspection (MAUi) Framework as a Guide for Minimal Viable Product (MVP) Testing in Lean Development Cycle [J].
Cheng, Lin Chou .
PROCEEDINGS OF CHIUXID 2016: BRIDGING THE GAPS IN THE HCI & UX WORLD, 2016, :1-11
[8]  
Chunduri R. K., 2018, J AMB INTEL HUM COMP, P1
[9]   GEOMETRICAL AND STATISTICAL PROPERTIES OF SYSTEMS OF LINEAR INEQUALITIES WITH APPLICATIONS IN PATTERN RECOGNITION [J].
COVER, TM .
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1965, EC14 (03) :326-&
[10]  
Frahling Gereon, 2005, P 37 ACM S THEOR COM, P209