Fuzzy knowledge based performance analysis on big data

被引:5
作者
Bharill, Neha [1 ]
Tiwari, Aruna [2 ]
Malviya, Aayushi [3 ]
Patel, Om Prakash [1 ]
Gupta, Akahansh [4 ]
Puthal, Deepak [5 ]
Saxena, Amit [6 ]
Prasad, Mukesh [7 ]
机构
[1] Jaypee Inst Informat Technol Noida, Dept Comp Sci & Engn, Noida 201304, India
[2] Indian Inst Technol Indore, Dept Comp Sci & Engn, Indore 453552, Madhya Pradesh, India
[3] Microsoft & Res Grp, Hyderabad 500032, India
[4] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi, India
[5] Univ Technol Sydney, Sch Elect & Data Engn, FEIT, Sydney, NSW, Australia
[6] Guru Ghasidas Vishwavidyalaya, Dept Comp Sci & IT, Bilaspur, India
[7] Univ Technol Sydney, Ctr Artificial Intelligence, Sydney, NSW, Australia
关键词
Incremental clustering algorithms; Big data; Apache spark framework; Parallel processing; Very large data; Internet of things; COMPLEXITY; KERNEL;
D O I
10.1016/j.neucom.2018.10.088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the various emerging technologies, an enormous amount of data, termed as Big Data, gets collected every day and can be of great use in various domains. Clustering algorithms that store the entire data into memory for analysis become unfeasible when the dataset is too large. Many clustering algorithms present in the literature deal with the analysis of huge amount of data. The paper discusses a new clustering approach called an Incremental Random Sampling with Iterative Optimization Fuzzy c-Means (IRSIO-FCM) algorithm. It is implemented on Apache Spark, a framework for Big Data processing. Sparks works really well for iterative algorithms by supporting in-memory computations, scalability, etc. IRSIO-FCM not only facilitates effective clustering of Big Data but also performs storage space optimization during clustering. To establish a fair comparison of IRSIO-FCM, we propose an incremental version of the Literal Fuzzy c-Means (LFCM) called ILFCM implemented in Apache Spark framework. The experimental results are analyzed in terms of time and space complexity, NMI, ARI, speedup, sizeup, and scaleup measures. The reported results show that IRSIO-FCM achieves a significant reduction in run-time in comparison with ILFCM. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:218 / 228
页数:11
相关论文
共 46 条
[1]   Clustering fMRI data with a robust unsupervised learning algorithm for neuroscience data mining [J].
Aljobouri, Hadeel K. ;
Jaber, Hussain A. ;
Kocak, Orhan M. ;
Algin, Oktay ;
Cankaya, Ilyas .
JOURNAL OF NEUROSCIENCE METHODS, 2018, 299 :45-54
[2]   A Self-Adaptive Online Brain-Machine Interface of a Humanoid Robot Through a General Type-2 Fuzzy Inference System [J].
Andreu-Perez, Javier ;
Cao, Fan ;
Hagras, Hani ;
Yang, Guang-Zhong .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (01) :101-116
[3]  
[Anonymous], 2013, UCI MACHINE LEARNING
[4]  
[Anonymous], 2013, PATTERN RECOGN, DOI DOI 10.1007/978-1-4757-0450-1
[5]  
[Anonymous], 2012, P 9 USENIX C NETW SY
[6]  
Bezdek J. C., 2003, Neural, Parallel & Scientific Computations, V11, P351
[7]   Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark [J].
Bharill, Neha ;
Tiwari, Aruna ;
Malviya, Aayushi .
IEEE Transactions on Big Data, 2016, 2 (04) :339-352
[8]  
Bharill N., 2014, Advance Trends in Soft Computing, P219
[9]   Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark [J].
Bharill, Neha ;
Tiwari, Aruna ;
Malviya, Aayushi .
PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, :95-104
[10]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113