A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture

被引:76
作者
Bouguila, Nizar [1 ]
Ziou, Djemel
机构
[1] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
[2] Univ Sherbrooke, Dept Informat, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
clustering; correlogram; expectation maximization (EM); finite mixture models; generalized Dirichlet; high-dimensional data; hybrid stochastic expectation maximization algorithm (HSEM); image database summarization; image object recognition; image restoration; maximum likelihood (ML); SEM; Vistex;
D O I
10.1109/TIP.2006.877379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper applies a robust statistical scheme to the problem of unsupervised learning of high-dimensional data. We develop, analyze, and apply a new finite mixture model based on a generalization of the Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. We show that the mathematical properties of this distribution allow high-dimensional modeling without requiring dimensionality reduction and, thus, without a loss of information. This makes the generalized Dirichlet distribution more practical and useful. We propose a hybrid stochastic expectation maximization algorithm (HSEM) to estimate the parameters of the generalized Dirichlet mixture. The algorithm is called stochastic because it contains a step in which the data elements are assigned randomly to components in order to avoid convergence to a saddle point. The adjective "hybrid" is justified by the introduction of a Newton-Raphson step. Moreover, the HSEM algorithm autonomously selects the number of components by the introduction of an agglomerative term. The performance of our method is tested by the classification of several pattern-recognition data sets. The generalized Dirichlet mixture is also applied to the problems of image restoration, image object recognition and texture image database summarization for efficient retrieval. For the texture image summarization problem, results are reported for the Vistex texture image database from the MIT Media Lab.
引用
收藏
页码:2657 / 2668
页数:12
相关论文
共 46 条
  • [21] A Solution to the High-Dimensional Classification Problem Using an Improved Hybrid Feature Selection Algorithm Guided by Interaction Information
    Nakariyakul, Songyot
    IEEE ACCESS, 2020, 8 : 145909 - 145917
  • [22] Model selection and application to high-dimensional count data clustering: via finite EDCM mixture models
    Zamzami, Nuha
    Bouguila, Nizar
    APPLIED INTELLIGENCE, 2019, 49 (04) : 1467 - 1488
  • [23] Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference
    Fan, Wentao
    Bouguila, Nizar
    Ziou, Djemel
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (07) : 1670 - 1685
  • [24] Feature selection using symmetric uncertainty and hybrid optimization for high-dimensional data
    Lin Sun
    Shujing Sun
    Weiping Ding
    Xinyue Huang
    Peiyi Fan
    Kunyu Li
    Leqi Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 4339 - 4360
  • [25] Clustering of fully polarimetric SAR data using finite Gp0 mixture model and SEM algorithm
    Horta, Michelle M.
    Mascarenhas, Nelson D. A.
    Frery, Alejandro C.
    Levada, Alexandre L. M.
    PROCEEDINGS OF IWSSIP 2008: 15TH INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING, 2008, : 81 - +
  • [26] Feature selection using symmetric uncertainty and hybrid optimization for high-dimensional data
    Sun, Lin
    Sun, Shujing
    Ding, Weiping
    Huang, Xinyue
    Fan, Peiyi
    Li, Kunyu
    Chen, Leqi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) : 4339 - 4360
  • [27] Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm
    Tsagris, Michail
    Papadakis, Manos
    Alenazi, Abdulaziz
    Alzeley, Omar
    COMPUTATION, 2024, 12 (09)
  • [28] Robust and flexible learning of a high-dimensional classification rule using auxiliary outcomes
    Liang, Muxuan
    Park, Jaeyoung
    Lu, Qing
    Zhong, Xiang
    BIOMETRICS, 2024, 80 (04)
  • [29] Estimation of predictive performance in high-dimensional data settings using learning curves
    Goedhart, Jeroen M.
    Klausch, Thomas
    van de Wiel, Mark A.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 180
  • [30] C-approximate nearest neighbor query algorithm based on learning for high-dimensional data
    Yuan, Pei-Sen
    Sha, Chao-Feng
    Wang, Xiao-Ling
    Zhou, Ao-Ying
    Ruan Jian Xue Bao/Journal of Software, 2012, 23 (08): : 2018 - 2031