Model-based subspace clustering of non-Gaussian data

被引:19
作者
Boutemedjet, Sabri [1 ]
Ziou, Djemel [1 ]
Bouguila, Nizar [2 ]
机构
[1] Univ Sherbrooke, Dept Informat, Sherbrooke, PQ J1K 2R1, Canada
[2] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 2W1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Clustering; Finite mixture models; Subspace clustering; Feature selection and extraction; Minimum message length; Bayesian information criterion; DIRICHLET MIXTURE MODEL; FEATURE-SELECTION; UNSUPERVISED SELECTION; CLASSIFICATION;
D O I
10.1016/j.neucom.2009.11.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new generalized Dirichlet (GD) mixture model to address the challenging problem of clustering multidimensional data sets on different feature subsets. We approximate class-conditional distributions of mixture components to define binary relevance of features at the level of clusters. We consider a relevant feature as the one providing the knowledge to assign data points in the cluster. Then, we define a new message length objective to learn the model and select both feature subsets and the number of components. The proposed method is general comparatively with existing feature selection and subspace clustering models. In addition, it selects for each cluster only relevant and statistically independent features in a linear time of the number of observations and dimensions. Experiments on synthetic data and in unsupervised image categorization show the merits of our approach. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1730 / 1739
页数:10
相关论文
共 42 条
[1]  
Agrawal R., 1998, AUTOMATIC SUBSPACE C, P94, DOI DOI 10.1145/276304.276314
[2]  
[Anonymous], P SIGIR
[3]  
[Anonymous], 2005, Statistical and Inductive Inference by Minimum Message Length
[4]  
[Anonymous], 2001, The Bayesian choice
[5]  
BOSCH A, 2005, EUR C COMPK VIS
[6]  
BOUGUILA N, 2006, STAT COMPUTING, V16
[7]  
Bouguila N, 2007, IEEE T PATTERN ANAL, V29, P1716, DOI [10.1109/TPAMI.2007.1095, 10.1109/TPAMl.2007.1095]
[8]   Unsupervised selection of a finite Dirichlet mixture model: An MML-based approach [J].
Bouguila, Nizar ;
Ziou, Djemel .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (08) :993-1009
[9]   A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture [J].
Bouguila, Nizar ;
Ziou, Djemel .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (09) :2657-2668
[10]  
BOUTEMEDJET S, 2007, IEEE T MULTIMEDIA