A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture

被引:76
作者
Bouguila, Nizar [1 ]
Ziou, Djemel
机构
[1] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
[2] Univ Sherbrooke, Dept Informat, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
clustering; correlogram; expectation maximization (EM); finite mixture models; generalized Dirichlet; high-dimensional data; hybrid stochastic expectation maximization algorithm (HSEM); image database summarization; image object recognition; image restoration; maximum likelihood (ML); SEM; Vistex;
D O I
10.1109/TIP.2006.877379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper applies a robust statistical scheme to the problem of unsupervised learning of high-dimensional data. We develop, analyze, and apply a new finite mixture model based on a generalization of the Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. We show that the mathematical properties of this distribution allow high-dimensional modeling without requiring dimensionality reduction and, thus, without a loss of information. This makes the generalized Dirichlet distribution more practical and useful. We propose a hybrid stochastic expectation maximization algorithm (HSEM) to estimate the parameters of the generalized Dirichlet mixture. The algorithm is called stochastic because it contains a step in which the data elements are assigned randomly to components in order to avoid convergence to a saddle point. The adjective "hybrid" is justified by the introduction of a Newton-Raphson step. Moreover, the HSEM algorithm autonomously selects the number of components by the introduction of an agglomerative term. The performance of our method is tested by the classification of several pattern-recognition data sets. The generalized Dirichlet mixture is also applied to the problems of image restoration, image object recognition and texture image database summarization for efficient retrieval. For the texture image summarization problem, results are reported for the Vistex texture image database from the MIT Media Lab.
引用
收藏
页码:2657 / 2668
页数:12
相关论文
共 46 条
  • [32] High-Dimensional Data Analysis Using Parameter Free Algorithm Data Point Positioning Analysis
    Mustapha, S. M. F. D. Syed
    APPLIED SCIENCES-BASEL, 2024, 14 (10):
  • [33] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Bu, Fanyu
    Chen, Zhikui
    Zhang, Qingchen
    Yang, Laurence T.
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (08) : 2977 - 2990
  • [34] A novel filter-wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets
    Gangavarapu, Tushaar
    Patil, Nagamma
    APPLIED SOFT COMPUTING, 2019, 81
  • [35] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Fanyu Bu
    Zhikui Chen
    Qingchen Zhang
    Laurence T. Yang
    The Journal of Supercomputing, 2016, 72 : 2977 - 2990
  • [36] Incorporating biological networks into high-dimensional Bayesian survival analysis using an ICM/M algorithm
    Pungpapong, Vitara
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (05)
  • [37] A variable clustering approach for overdispersed high-dimensional count data using a copula-based mixture model
    Brini, Alberto
    Manju, Abu
    van den Heuvel, Edwin R.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [38] MICQ-IPSO: An effective two-stage hybrid feature selection algorithm for high-dimensional data
    Li, Xinqian
    Ren, Jia
    NEUROCOMPUTING, 2022, 501 : 328 - 342
  • [39] An Enhanced Clustering Of High Dimensional Datasets Using Unsupervised Quick Reduct Algorithm (USQR) With Rough Set Theory
    Gomathi, P.
    Dhanabal, S.
    Kaliappan, Vishnu Kumar
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 185 - 187
  • [40] Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis Using High-Dimensional Molecular Profiling Data
    Xiong, Feiyu
    Kam, Moshe
    Hrebien, Leonid
    Wang, Beilun
    Qi, Yanjun
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 10 (04)