A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引:39
|
作者
Dierckens, Karl E. [1 ]
Harrison, Adrian B. [1 ]
Leung, Carson K. [1 ]
Pind, Adrienne V. [1 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
来源
2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年
基金
加拿大自然科学与工程研究理事会;
关键词
Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.
引用
收藏
页码:925 / 932
页数:8
相关论文
共 50 条
  • [1] Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data
    Xie, Ting
    Liu, Ruihua
    Wei, Zhengyuan
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2020, 5 (01) : 1 - 10
  • [2] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [3] How to Use K-means for Big Data Clustering?
    Mussabayev, Rustam
    Mladenovic, Nenad
    Jarboui, Bassem
    Mussabayev, Ravil
    PATTERN RECOGNITION, 2023, 137
  • [4] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [5] BiModalClust: Fused Data and Neighborhood Variation for Advanced K-Means Big Data Clustering
    Mussabayev, Ravil
    Mussabayev, Rustam
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [6] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
    Bandyopadhyay, Soumyendu Sekhar
    Halder, Anup Kumar
    Chatterjee, Piyali
    Nasipuri, Mita
    Basu, Subhadip
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
  • [7] K-MEANS plus : A DEVELOPED CLUSTERING ALGORITHM FOR BIG DATA
    Niu, Kun
    Gao, Zhipeng
    Jiao, Haizhen
    Deng, Nanjie
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 141 - 144
  • [8] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [9] Canopy with k-means Clustering Algorithm for Big Data Analytics
    Sagheer, Noor S.
    Yousif, Suhad A.
    FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
  • [10] A Novel K-Means based Clustering Algorithm for Big Data
    Sinha, Ankita
    Jana, Prasanta K.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1875 - 1879