Apache Mahout's k-Means vs. Fuzzy k-Means Performance Evaluation

被引:0
作者
Xhafa, Fatos [1 ,3 ]
Bogza, Adriana [1 ]
Caballe, Santi [2 ]
Barolli, Leonard
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Univ Oberta Catalunya, Barcelona, Spain
[3] Univ Oberta Catalunya, SmartLearn Grp, Barcelona, Spain
来源
2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS) | 2016年
关键词
Data Mining Algorithms; Apache Mahout; Big Data; k-Means; Fuzzy k-Means; Performance; Hadoop Cluster;
D O I
10.1109/INCoS.2016.103
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the old data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
引用
收藏
页码:110 / 116
页数:7
相关论文
共 50 条
  • [1] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [2] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [4] Performance evaluation of K-means clustering on Hadoop infrastructure
    Vats, Satvik
    Sagar, B. B.
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (08) : 1349 - 1363
  • [5] Fuzzy k-Means: history and applications
    Ferraro, Maria Brigida
    ECONOMETRICS AND STATISTICS, 2024, 30 : 110 - 123
  • [6] Deep k-Means: Jointly clustering with k-Means and learning representations
    Fard, Maziar Moradi
    Thonet, Thibaut
    Gaussier, Eric
    PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
  • [7] CLUSTERING THE PHYSICO-CHEMICAL PROPERTIES OF SEVENTEEN APPROVED BREAST CANCER DRUGS WITH K-MEANS AND FUZZY K-MEANS
    Gupta, V. M. N. S. S. V. K. R.
    Krishna, Ch V. Phani
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2020, 13 (01): : 23 - 51
  • [8] Density K-means : A New Algorithm for Centers Initialization for K-means
    Lan, Xv
    Li, Qian
    Zheng, Yi
    PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 958 - 961
  • [9] PSO Aided k-Means Clustering: Introducing Connectivity in k-Means
    Breaban, Mihaela Elena
    Luchian, Henri
    GECCO-2011: PROCEEDINGS OF THE 13TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2011, : 1227 - 1234
  • [10] Anomaly Detection by Using Streaming K-Means and Batch K-Means
    Wang, Zhuo
    Zhou, Yanghui
    Li, Gangmin
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 11 - 17