Strategies for Big Data Clustering

被引:28
作者
Kurasova, Olga [1 ]
Marcinkevicius, Virginijus [1 ]
Medvedev, Viktor [1 ]
Rapecka, Aurimas [1 ]
Stefanovic, Pavel [1 ]
机构
[1] Vilnius State Univ, Inst Math & Informat, LT-08663 Vilnius, Lithuania
来源
2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI) | 2014年
关键词
big data; clustering methods; data mining; Hadoop; VISUAL ANALYSIS;
D O I
10.1109/ICTAI.2014.115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the paper, an overview of methods and technologies used for big data clustering is presented. The clustering is one of the important data mining issue especially for big data analysis, where large volume data should be grouped. Here some clustering methods are described, great attention is paid to the k-means method and its modifications, because it still remains one of the popular methods and is implemented in innovative technologies for big data analysis. Neural network-based self-organizing maps and their extensions for big data clustering are reviewed, too. Some strategies for big data clustering are also presented and discussed. It is shown the data of which volume can be clustered in the well known data mining systems WEKA and KNIME and when new sophisticated technologies are needed.
引用
收藏
页码:740 / 747
页数:8
相关论文
共 50 条
  • [21] Improved CURE Clustering for Big Data using Hadoop and Mapreduce
    Lathiya, Piyush
    Rani, Rinkle
    [J]. 2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 241 - 245
  • [22] Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering
    Zhang, Chihao
    Yang, Yang
    Zhou, Wei
    Zhang, Shihua
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3701 - 3713
  • [23] Parallel and distributed clustering framework for big spatial data mining
    Bendechache, Malika
    Tari, A-Kamel
    Kechadi, M-Tahar
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 671 - 689
  • [24] The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data
    Gandhi, Bhagyashri S.
    Deshpande, Leena A.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
  • [25] Elevating Big Data Privacy: Innovative Strategies and Challenges in Data Abundance
    Elkawkagy, Mohamed
    Elwan, E.
    Al Sumait, Albandari
    Elbeh, Heba
    Aljameel, Sumayh S.
    [J]. IEEE ACCESS, 2024, 12 : 20931 - 20941
  • [26] Survey on clustering methods : Towards fuzzy clustering for big data
    Ben Ayed, Abdelkarim
    Ben Halima, Mohamed
    Alimi, Adel M.
    [J]. 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 331 - 336
  • [27] A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data
    Dierckens, Karl E.
    Harrison, Adrian B.
    Leung, Carson K.
    Pind, Adrienne V.
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 925 - 932
  • [28] p-PIC: Parallel power iteration clustering for big data
    Yan, Weizhong
    Brahmakshatriya, Umang
    Xue, Ya
    Gilder, Mark
    Wise, Bowden
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (03) : 352 - 359
  • [29] Big Data Analytics in Healthcare: COVID-19 Indonesia Clustering
    Andry, Johanes Fernandes
    Rembulan, Glisina Dwinoor
    Salim, Edwin Leonard
    Fatmawati, Endang
    Tannady, Hendy
    [J]. JOURNAL OF POPULATION THERAPEUTICS AND CLINICAL PHARMACOLOGY, 2023, 30 (04): : E290 - E300
  • [30] Hyperplane Division in Fuzzy C-Means: Clustering Big Data
    Shen, Yinghua
    Pedrycz, Witold
    Chen, Yuan
    Wang, Xianmin
    Gacek, Adam
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (11) : 3032 - 3046