Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity

被引:1
|
作者
Kim, Seung-Gu [1 ]
机构
[1] Sangji Univ, Dept Data & Informat, 83 Usan Dong, Wonju 122807, South Korea
关键词
Informative variables; variable selection; clustering; EM algorithm; microarray gene expression;
D O I
10.5351/KJAS.2011.24.6.1213
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In high dimensionality where the number of variables are excessively larger than observations, it is required to remove the noninformative variables to cluster observations. Most model-based approaches for variable selection have been considered under the assumption of homoscedasticity and their models are mainly estimated by a penalized likelihood method. In this paper, a different approach is proposed to remove the noninformative variables effectively and to cluster based on the modified normal mixture model simultaneously. The validity of the model was provided and an EM algorithm was derived to estimate the parameters. Simulation studies and an experiment using real microarray dataset showed the effectiveness of the proposed method.
引用
收藏
页码:1213 / 1224
页数:12
相关论文
共 50 条