Apache Mahout's k-Means vs. Fuzzy k-Means Performance Evaluation

被引:0
作者
Xhafa, Fatos [1 ,3 ]
Bogza, Adriana [1 ]
Caballe, Santi [2 ]
Barolli, Leonard
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Univ Oberta Catalunya, Barcelona, Spain
[3] Univ Oberta Catalunya, SmartLearn Grp, Barcelona, Spain
来源
2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS) | 2016年
关键词
Data Mining Algorithms; Apache Mahout; Big Data; k-Means; Fuzzy k-Means; Performance; Hadoop Cluster;
D O I
10.1109/INCoS.2016.103
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the old data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
引用
收藏
页码:110 / 116
页数:7
相关论文
共 50 条
  • [41] Spatial Transformer K-Means
    Cosentino, Romain
    Balestriero, Randall
    Bahroun, Yanis
    Sengupta, Anirvan
    Baraniuk, Richard
    Aazhang, Behnaam
    2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 1444 - 1448
  • [42] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [43] Sparse Subspace K-means
    Diallo, Abdoul Wahab
    Niang, Ndeye
    Ouattara, Mory
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 678 - 685
  • [44] Constrained K-Means Classification
    Smyrlis, Panagiotis N.
    Tsouros, Dimosthenis C.
    Tsipouras, Markos G.
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (04) : 3203 - 3208
  • [45] Subspace K-means clustering
    Timmerman, Marieke E.
    Ceulemans, Eva
    De Roover, Kim
    Van Leeuwen, Karla
    BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
  • [46] Implementation and Comparison of K-Means and Fuzzy C-Means Algorithms for Agricultural Data
    Shedthi, Shabari B.
    Shetty, Surendra
    Siddappa, M.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 105 - 108
  • [47] Improving Bregman k-means
    Ashour, Wesam
    Fyfe, Colin
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 65 - 82
  • [48] Incremental k-Means Method
    Prasad, Rabinder Kumar
    Sarmah, Rosy
    Chakraborty, Subrata
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 38 - 46
  • [49] Elkan's k-Means Algorithm for Graphs
    Jain, Brijnesh J.
    Obermayer, Klaus
    ADVANCES IN SOFT COMPUTING - MICAI 2010, PT II, 2010, 6438 : 22 - 32
  • [50] Comparative Study of K-Means, Pam and Rough K-Means Algorithms Using Cancer Datasets
    Kumar, Parvesh
    Wasan, Krishan
    COMPUTING, COMMUNICATION, AND CONTROL, 2011, 1 : 136 - 140