A simple model-based approach to variable selection in classification and clustering

被引:3
作者
Partovi Nia, Vahid [1 ,2 ]
Davison, Anthony C. [3 ]
机构
[1] Polytech Montreal, GERAD Res Ctr, Montreal, PQ J3T 1J4, Canada
[2] Polytech Montreal, Dept Math & Ind Engn, Montreal, PQ J3T 1J4, Canada
[3] Ecole Polytech Fed Lausanne, EPFL FSB MATHAA STAT, CH-1015 Lausanne, Switzerland
来源
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE | 2015年 / 43卷 / 02期
基金
加拿大自然科学与工程研究理事会; 瑞士国家科学基金会;
关键词
Classification; Clustering; high-dimensional data; hierarchical partitioning; Laplace distribution; mixture model; variable selection; MIXTURE MODEL; EXPRESSION; BAYES;
D O I
10.1002/cjs.11241
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering and classification of replicated data is often performed using classical techniques that inappropriately treat the data as unreplicated, or by complex modern ones that are computationally demanding. In this paper, we introduce a simple approach based on a spike-and-slab mixture model that is fast, automatic, allows classification, clustering and variable selection in a single framework, and can handle replicated or unreplicated data. Simulation shows that our approach compares well with other recently proposed methods. The ideas are illustrated by application to microarray and metabolomic data. The Canadian Journal of Statistics 43: 157-175; 2015 (c) 2015 Statistical Society of Canada
引用
收藏
页码:157 / 175
页数:19
相关论文
共 41 条
  • [1] [Anonymous], 2001, Generalized, Linear and Mixed Models
  • [2] [Anonymous], 2011, Cluster Analysis
  • [3] [Anonymous], 2005, P 22 INT C MACH LEAR
  • [4] Bergé L, 2012, J STAT SOFTW, V46, P1
  • [5] A Laplace mixture model for identification of differential expression in microarray experiments
    Bhowmick, Debjani
    Davison, A. C.
    Goldstein, Darlene R.
    Ruffieux, Yann
    [J]. BIOSTATISTICS, 2006, 7 (04) : 630 - 641
  • [6] Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations
    Bickel, PJ
    Levina, E
    [J]. BERNOULLI, 2004, 10 (06) : 989 - 1010
  • [7] Clustering using objective functions and stochastic search
    Booth, James G.
    Casella, George
    Hobert, James P.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 119 - 139
  • [8] Chang WC, 1983, J ROY STAT SOC C, V32, P267, DOI 10.2307/2347949
  • [9] Claeskens G., 2008, Cambridge Series in Statistical and Probabilistic Mathematics
  • [10] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87