Enhanced Bayesian Gaussian hidden Markov mixture clustering for improved knowledge discovery

被引：0

作者：

Ganesan, Anusha ^{[1
]}

Paul, Anand ^{[2
]}

Kim, Sungho ^{[1
]}

机构：

[1] Yeungnam Univ, Dept Elect Engn, Gyongsan 38541, South Korea

[2] Louisiana State Univ, Dept Biostat & Data Sci, Hlth Sci Ctr, New Orleans, LA 70112 USA

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2024年 / 27卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Baum-Welch algorithm; Bayesian; Bayesian Gaussian mixture model; Clustering; Cross-validation; Distance metric; Gaussian; Hidden Markov model; Mixture models; Viterbi algorithm; MODELS; HMM;

D O I：

10.1007/s10044-024-01374-w

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The hidden Markov model (HMM) is widely utilized in natural language processing, speech recognition, autonomous vehicular systems, and healthcare for tasks such as clustering, pattern recognition, predictive modeling, anomaly detection, and time-series forecasting. However, HMMs can be sensitive to initial states, compromising clustering reliability. To address this issue, we propose an innovative integration of an HMM with hybrid distance metric learning and a modified Bayesian Gaussian mixture model (BGMM) to enhance clustering performance and robustness. A significant challenge in HMM applications is determining the optimal number of hidden states. We address this using a k-fold cross-validation strategy. Implementing our Bayesian Gaussian Hidden Markov Mixture Clustering Model (BGH2MCM) on five diverse datasets, we categorize the observed data sequences according to underlying hidden state sequences. This approach yields superior outcomes to conventional techniques such as K-means, agglomerative clustering, density-based spatial clustering of applications with noise (DBSCAN), and the BGMM. We evaluate the efficiency of our model using silhouette, Davies-Bouldin, and Calinski-Harabasz scores, accuracy metrics, and computation time. Our results demonstrate that the BGH2MCM consistently achieves better clustering quality and computational efficiency, showing an average computation time 23% lower than agglomerative clustering with HMM, 22% less than DBSCAN with HMM, and 14% lower than K-means with the HMM and a BGMM-HMM across all datasets. This study highlights the potential of our BGH2MCM to improve data mining and knowledge discovery practices from complex, real-world datasets.

引用

页数：16

共 33 条

[31] Improved Insights on Financial Health through Partially Constrained Hidden Markov Model Clustering on Loan Repayment Data
Philip, Dibu John
Sudarsanam, Nandan
Ravindran, Balaraman
DATA BASE FOR ADVANCES IN INFORMATION SYSTEMS, 2018, 49 (03): : 98 - 113
[32] DYNAMIC BAYESIAN NETWORK AND HIDDEN MARKOV MODEL OF PREDICTING IOT DATA FOR MACHINE LEARNING MODEL USING ENHANCED RECURSIVE FEATURE ELIMINATION
Noeiaghdam, S.
Balamuralitharan, S.
Govindan, V
BULLETIN OF THE SOUTH URAL STATE UNIVERSITY SERIES-MATHEMATICAL MODELLING PROGRAMMING & COMPUTER SOFTWARE, 2022, 15 (03): : 111 - 126
[33] Hidden Markov model/Gaussian mixture models (HMM/GMM) based voice command system: A way to improve the control of remotely operated robot arm TR45
El-Emary, Ibrahim M. M.
Fezari, Mohamed
Attoui, Hamza
SCIENTIFIC RESEARCH AND ESSAYS, 2011, 6 (02): : 341 - 350

← 1 2 3 4 →