Apache Mahout's k-Means vs. Fuzzy k-Means Performance Evaluation

被引：0

作者：

Xhafa, Fatos ^{[1
,3
]}

Bogza, Adriana ^{[1
]}

Caballe, Santi ^{[2
]}

Barolli, Leonard

机构：

[1] Univ Politecn Cataluna, Barcelona, Spain

[2] Univ Oberta Catalunya, Barcelona, Spain

[3] Univ Oberta Catalunya, SmartLearn Grp, Barcelona, Spain

来源：

2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS) | 2016年

关键词：

Data Mining Algorithms; Apache Mahout; Big Data; k-Means; Fuzzy k-Means; Performance; Hadoop Cluster;

D O I：

10.1109/INCoS.2016.103

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the old data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.

引用

页码：110 / 116

页数：7

共 50 条

[41] Spatial Transformer K-Means
Cosentino, Romain
Balestriero, Randall
Bahroun, Yanis
Sengupta, Anirvan
Baraniuk, Richard
Aazhang, Behnaam
2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 1444 - 1448
[42] K*-Means: An Effective and Efficient K-means Clustering Algorithm
Qi, Jianpeng
Yu, Yanwei
Wang, Lihong
Liu, Jinglei
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
[43] Sparse Subspace K-means
Diallo, Abdoul Wahab
Niang, Ndeye
Ouattara, Mory
21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 678 - 685
[44] Constrained K-Means Classification
Smyrlis, Panagiotis N.
Tsouros, Dimosthenis C.
Tsipouras, Markos G.
ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (04) : 3203 - 3208
[45] Subspace K-means clustering
Timmerman, Marieke E.
Ceulemans, Eva
De Roover, Kim
Van Leeuwen, Karla
BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
[46] Implementation and Comparison of K-Means and Fuzzy C-Means Algorithms for Agricultural Data
Shedthi, Shabari B.
Shetty, Surendra
Siddappa, M.
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 105 - 108
[47] Improving Bregman k-means
Ashour, Wesam
Fyfe, Colin
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 65 - 82
[48] Incremental k-Means Method
Prasad, Rabinder Kumar
Sarmah, Rosy
Chakraborty, Subrata
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 38 - 46
[49] Elkan's k-Means Algorithm for Graphs
Jain, Brijnesh J.
Obermayer, Klaus
ADVANCES IN SOFT COMPUTING - MICAI 2010, PT II, 2010, 6438 : 22 - 32
[50] Comparative Study of K-Means, Pam and Rough K-Means Algorithms Using Cancer Datasets
Kumar, Parvesh
Wasan, Krishan
COMPUTING, COMMUNICATION, AND CONTROL, 2011, 1 : 136 - 140

← 1 2 3 4 5 →