Multivariate bounded support Kotz mixture model with semi-supervised projected model-based clustering

被引:0
作者
Araya, Tsega Weldu [1 ]
Azam, Muhammad [2 ]
Bouguila, Nizar [1 ]
Bentahar, Jamal [1 ,3 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
[2] Algoma Univ, Fac Comp Sci & Technol, Sault Ste Marie, ON, Canada
[3] Khalifa Univ, 6G Res Ctr, Dept Comp Sci, Abu Dhabi, U Arab Emirates
基金
加拿大自然科学与工程研究理事会;
关键词
Multivariate bounded Kotz mixture model; (BKMM); Semi-supervised learning; Model-based clustering; Semi-supervised projected model-based; clustering (SeSProC); Minimum message length (MML); Data clustering; Histogram of oriented gradients (HOG); UNSUPERVISED SELECTION; ORIENTED GRADIENTS; DISTRIBUTIONS; SEGMENTATION;
D O I
10.1016/j.inffus.2025.103330
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is a crucial technique in data analysis, aimed at identifying and grouping similar data points to uncover underlying structures within a dataset. We propose a new unsupervised clustering approach using a multivariate bounded Kotz mixture model (BKMM) for data modeling when the data lie within a bounded support region. In many real applications, BKMM effectively handles observed data that fall within these limits, accurately modeling and clustering the observations. In BKMM, parameter estimation is performed by maximizing the log-likelihood using Expectation-Maximization (EM) algorithm and the Newton-Raphson method. Additionally, we explore the enhancements in clustering performance through semi-supervised learning by incorporating a small amount of labeled data to guide the clustering process. Thus, we propose a bounded Kotz mixture model using a semi-supervised projected model-based clustering method (BKMM-SeSProC) to obtain hidden cluster labels. Model selection in mixtures is essential for determining the optimal number of mixture components, and we introduce a minimum message length (MML) model selection criterion to find the best number of clusters in the BKMM-SeSProC approach. A greedy forward search is applied to estimate the optimal number of clusters. We use the same datasets to evaluate our proposed models, BKMM and BKMM-SeSProC, for data clustering. Additionally, we utilize MML model selection with BKMM-SeSProC to determine the number of components. Initially, we validate both proposed models and the model selection process in various medical applications. Furthermore, to assess their broader performance, we test the models on image datasets, including Alzheimer's disease, lung tissue, and gastrointestinal tract images for disease recognition, and the CIFAR-100 dataset for object categorization. BKMM is compared with the Kotz mixture model (KMM), Student's t mixture model (SMM), Laplace mixture model (LMM), bounded Gaussian mixture model (BGMM), and Gaussian mixture model (GMM) under similar experimental settings across all datasets. To evaluate the performance of BKMM and BKMM-SeSProC, several performance metrics are employed. To find the best number of clusters for BKMM-SeSProC, we examine the effectiveness of MML model selection against seven different criteria. The experimental results demonstrate that the proposed BKMM outperforms the compared models, KMM, SMM, LMM, BGMM, and GMM, in all applications. Additionally, the semi-supervised projected model-based clustering shows better performance across all evaluation metrics compared to unsupervised BKMM.
引用
收藏
页数:22
相关论文
共 79 条
[1]  
Borkowski AA, 2019, Arxiv, DOI [arXiv:1912.12142, DOI 10.48550/ARXIV.1912.12142]
[2]  
Agusta Y, 2003, LECT NOTES ARTIF INT, V2903, P477
[3]   Eigenvalue Ratio Test for the Number of Factors [J].
Ahn, Seung C. ;
Horenstein, Alex R. .
ECONOMETRICA, 2013, 81 (03) :1203-1227
[4]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[5]   A Novel Clustering Index to Find Optimal Clusters Size With Application to Segmentation of Energy Consumers [J].
Al Khafaf, Nameer ;
Jalili, Mahdi ;
Sokolowski, Peter .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (01) :346-355
[6]   Bounded multivariate generalized Gaussian mixture model using ICA and IVA [J].
Algumaei, Ali ;
Azam, Muhammad ;
Najar, Fatma ;
Bouguila, Nizar .
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) :1223-1252
[7]   Finite general Gaussian mixture modeling and application to image and video foreground segmentation [J].
Allili, Mohand Said ;
Bouguila, Nizar ;
Ziou, Djernel .
JOURNAL OF ELECTRONIC IMAGING, 2008, 17 (01)
[8]  
Anderson M, 2013, INT CONF ACOUST SPEE, P3243, DOI 10.1109/ICASSP.2013.6638257
[9]   A new class of multivariate distributions: Scale mixture of Kotz-type distributions [J].
Arslan, O .
STATISTICS & PROBABILITY LETTERS, 2005, 75 (01) :18-28
[10]   Variance-Mean Mixture of Kotz-Type Distributions [J].
Arslan, Olcay .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2009, 38 (02) :272-284