Consensus Big Data Clustering for Bayesian Mixture Models

被引:3
|
作者
Karras, Christos [1 ]
Karras, Aristeidis [1 ]
Giotopoulos, Konstantinos C. [2 ]
Avlonitis, Markos [3 ]
Sioutas, Spyros [1 ]
机构
[1] Univ Patras, Comp Engn & Informat Dept, Patras 26504, Greece
[2] Univ Patras, Dept Management Sci & Technol, Patras 26334, Greece
[3] Ionian Univ, Dept Informat, Kerkira 49100, Greece
关键词
stochastic data engineering; cluster analysis; Bayesian mixture modelling; consensus clustering; big-data management and analytics; NUMBER;
D O I
10.3390/a16050245
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of big-data analysis, the clustering technique holds significant importance for the effective categorization and organization of extensive datasets. However, pinpointing the ideal number of clusters and handling high-dimensional data can be challenging. To tackle these issues, several strategies have been suggested, such as a consensus clustering ensemble that yields more significant outcomes compared to individual models. Another valuable technique for cluster analysis is Bayesian mixture modelling, which is known for its adaptability in determining cluster numbers. Traditional inference methods such as Markov chain Monte Carlo may be computationally demanding and limit the exploration of the posterior distribution. In this work, we introduce an innovative approach that combines consensus clustering and Bayesian mixture models to improve big-data management and simplify the process of identifying the optimal number of clusters in diverse real-world scenarios. By addressing the aforementioned hurdles and boosting accuracy and efficiency, our method considerably enhances cluster analysis. This fusion of techniques offers a powerful tool for managing and examining large and intricate datasets, with possible applications across various industries.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] AnaCoDa: analyzing codon data with Bayesian mixture models
    Landerer, Cedric
    Cope, Alexander
    Zaretzki, Russell
    Gilchrist, Michael A.
    BIOINFORMATICS, 2018, 34 (14) : 2496 - 2498
  • [32] Clustering for binary data and mixture models - Choice of the model
    Nadif, M
    Govaert, G
    APPLIED STOCHASTIC MODELS AND DATA ANALYSIS, 1997, 13 (3-4): : 269 - 278
  • [33] Clustering of gene expression data by mixture of PCA models
    Yoshioka, T
    Morioka, R
    Kobayashi, K
    Oba, S
    Ogawsawara, N
    Ishii, S
    ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 522 - 527
  • [34] Discrete data clustering using finite mixture models
    Bouguila, Nizar
    ElGuebaly, Walid
    PATTERN RECOGNITION, 2009, 42 (01) : 33 - 42
  • [35] Scalable incremental fuzzy consensus clustering algorithm for handling big data
    Jha, Preeti
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Nagendra, Neha
    Mounika, Mukkamalla
    SOFT COMPUTING, 2021, 25 (13) : 8703 - 8719
  • [36] Scalable incremental fuzzy consensus clustering algorithm for handling big data
    Preeti Jha
    Aruna Tiwari
    Neha Bharill
    Milind Ratnaparkhe
    Neha Nagendra
    Mukkamalla Mounika
    Soft Computing, 2021, 25 : 8703 - 8719
  • [37] Clustering sparse binary data with hierarchical Bayesian Bernoulli mixture model
    Ye, Mao
    Zhang, Peng
    Nie, Lizhen
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 32 - 49
  • [38] Bayesian consensus clustering in multiplex networks
    Jovanovski, Petar
    Kocarev, Ljupco
    CHAOS, 2019, 29 (10)
  • [39] On clustering by mixture models
    McLachlan, GJ
    Ng, SK
    Peel, D
    EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 141 - 148
  • [40] Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering
    Zhou, Ri-Gui
    Wang, Wei
    ETRI JOURNAL, 2021, 43 (01) : 74 - 81