A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture

被引:76
作者
Bouguila, Nizar [1 ]
Ziou, Djemel
机构
[1] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
[2] Univ Sherbrooke, Dept Informat, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
clustering; correlogram; expectation maximization (EM); finite mixture models; generalized Dirichlet; high-dimensional data; hybrid stochastic expectation maximization algorithm (HSEM); image database summarization; image object recognition; image restoration; maximum likelihood (ML); SEM; Vistex;
D O I
10.1109/TIP.2006.877379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper applies a robust statistical scheme to the problem of unsupervised learning of high-dimensional data. We develop, analyze, and apply a new finite mixture model based on a generalization of the Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. We show that the mathematical properties of this distribution allow high-dimensional modeling without requiring dimensionality reduction and, thus, without a loss of information. This makes the generalized Dirichlet distribution more practical and useful. We propose a hybrid stochastic expectation maximization algorithm (HSEM) to estimate the parameters of the generalized Dirichlet mixture. The algorithm is called stochastic because it contains a step in which the data elements are assigned randomly to components in order to avoid convergence to a saddle point. The adjective "hybrid" is justified by the introduction of a Newton-Raphson step. Moreover, the HSEM algorithm autonomously selects the number of components by the introduction of an agglomerative term. The performance of our method is tested by the classification of several pattern-recognition data sets. The generalized Dirichlet mixture is also applied to the problems of image restoration, image object recognition and texture image database summarization for efficient retrieval. For the texture image summarization problem, results are reported for the Vistex texture image database from the MIT Media Lab.
引用
收藏
页码:2657 / 2668
页数:12
相关论文
共 46 条
  • [1] High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length
    Bouguila, Nizar
    Ziou, Djemel
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (10) : 1716 - 1731
  • [2] An algorithm for unsupervised learning and optimization of finite mixture models
    Abas, Ahmed R.
    EGYPTIAN INFORMATICS JOURNAL, 2011, 12 (01) : 19 - 27
  • [3] An Infinite Mixture Model of Generalized Inverted Dirichlet Distributions for High-Dimensional Positive Data Modeling
    Bouguila, Nizar
    Al Mashrgy, Mohamed
    INFORMATION AND COMMUNICATION TECHNOLOGY, 2014, 8407 : 296 - 305
  • [4] Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models
    Elguebaly, Tarek
    Bouguila, Nizar
    SIGNAL PROCESSING, 2013, 93 (06) : 1531 - 1546
  • [5] Hybrid fast unsupervised feature selection for high-dimensional data
    Manbari, Zhaleh
    AkhlaghianTab, Fardin
    Salavati, Chiman
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 97 - 118
  • [6] Identifying redundant features using unsupervised learning for high-dimensional data
    Danasingh, Asir Antony Gnana Singh
    Subramanian, Appavu alias Balamurugan
    Epiphany, Jebamalar Leavline
    SN APPLIED SCIENCES, 2020, 2 (08):
  • [7] Identifying redundant features using unsupervised learning for high-dimensional data
    Asir Antony Gnana Singh Danasingh
    Appavu alias Balamurugan Subramanian
    Jebamalar Leavline Epiphany
    SN Applied Sciences, 2020, 2
  • [8] Flexible High-Dimensional Unsupervised Learning with Missing Data
    Wei, Yuhong
    Tang, Yang
    McNicholas, Paul D.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 610 - 621
  • [9] Bayesian variable selection in clustering high-dimensional data via a mixture of finite mixtures
    Doo, Woojin
    Kim, Heeyoung
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (12) : 2551 - 2568
  • [10] Entropy-Based Variational Learning of Finite Generalized Inverted Dirichlet Mixture Model
    Ahmadzadeh, Mohammad Sadegh
    Manouchehri, Narges
    Ennajari, Hafsa
    Bouguila, Nizar
    Fan, Wentao
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 130 - 143