A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture

被引:76
作者
Bouguila, Nizar [1 ]
Ziou, Djemel
机构
[1] Concordia Univ, CIISE, Montreal, PQ H3G 1T7, Canada
[2] Univ Sherbrooke, Dept Informat, Sherbrooke, PQ J1K 2R1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
clustering; correlogram; expectation maximization (EM); finite mixture models; generalized Dirichlet; high-dimensional data; hybrid stochastic expectation maximization algorithm (HSEM); image database summarization; image object recognition; image restoration; maximum likelihood (ML); SEM; Vistex;
D O I
10.1109/TIP.2006.877379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper applies a robust statistical scheme to the problem of unsupervised learning of high-dimensional data. We develop, analyze, and apply a new finite mixture model based on a generalization of the Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. We show that the mathematical properties of this distribution allow high-dimensional modeling without requiring dimensionality reduction and, thus, without a loss of information. This makes the generalized Dirichlet distribution more practical and useful. We propose a hybrid stochastic expectation maximization algorithm (HSEM) to estimate the parameters of the generalized Dirichlet mixture. The algorithm is called stochastic because it contains a step in which the data elements are assigned randomly to components in order to avoid convergence to a saddle point. The adjective "hybrid" is justified by the introduction of a Newton-Raphson step. Moreover, the HSEM algorithm autonomously selects the number of components by the introduction of an agglomerative term. The performance of our method is tested by the classification of several pattern-recognition data sets. The generalized Dirichlet mixture is also applied to the problems of image restoration, image object recognition and texture image database summarization for efficient retrieval. For the texture image summarization problem, results are reported for the Vistex texture image database from the MIT Media Lab.
引用
收藏
页码:2657 / 2668
页数:12
相关论文
共 46 条
  • [41] FleBiC: Learning classifiers from high-dimensional biomedical data using discriminative biclusters with non-constant patterns
    Henriques, Rui
    Madeira, Sara C.
    PATTERN RECOGNITION, 2021, 115 (115)
  • [42] High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning
    Erfani, Sarah M.
    Rajasegarar, Sutharshan
    Karunasekera, Shanika
    Leckie, Christopher
    PATTERN RECOGNITION, 2016, 58 : 121 - 134
  • [43] Towards lowering computational power in IoT systems: Clustering algorithm for high-dimensional data stream using entropy window reduction
    Alkawsi, Gamal
    Al-amri, Redhwan
    Baashar, Yahia
    Ghorashi, Sara
    Alabdulkreem, Eatedal
    Tiong, Sieh Kiong
    ALEXANDRIA ENGINEERING JOURNAL, 2023, 70 : 503 - 513
  • [44] Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional data
    Lee, Chien-Pang
    Lin, Wen-Shin
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (04) : 315 - 331
  • [45] A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets
    Hameed, Shilan S.
    Hassan, Wan Haslina
    Latiff, Liza Abdul
    Muhammadsharif, Fahmi F.
    SOFT COMPUTING, 2021, 25 (13) : 8683 - 8701
  • [46] A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets
    Shilan S. Hameed
    Wan Haslina Hassan
    Liza Abdul Latiff
    Fahmi F. Muhammadsharif
    Soft Computing, 2021, 25 : 8683 - 8701