ST-PLS:: a multi-directional nearest shrunken centroid type classi er via PLS

被引:42
作者
Saebo, S. [1 ]
Almoy, T. [1 ]
Aaroe, J. [2 ]
Aastveit, A. H. [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, N-1432 As, Norway
[2] Norwegian Radium Hosp, Dept Genet, N-0310 Oslo, Norway
关键词
classi cation; gene expression; soft-thresholding; variable selection;
D O I
10.1002/cem.1101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The nearest shrunken centroid (NSC) Classier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classi er is minimized by the so-called soft-thresholding property of the approach. In this paper, we rst show that for the two-class prediction problem, the NSC Classi er is similar to a one-component discriminant partial least squares (PLS) model with soft-shrinkage of the loading weights. Then we introduce the soft-threshold-PLS (ST-PLS) as a general discriminant-PLS model with soft-thresholding of the loading weights of multiple latent components. This method is especially suited for classi cation and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST-PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST-PLS and the NSC classi ers are applied to four real data sets. The results indicate that ST-PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classi cation, and there are strong dependencies between subsets of variables. Copyright (c) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:54 / 62
页数:9
相关论文
共 30 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[3]  
[Anonymous], 1966, Multivariate Analysis
[4]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[5]  
Belsley D. A., 1980, REGRESSION DIAGNOSTI
[6]  
Boulesteix A.L., 2004, STAT APPL GENET MOL, V3, P33, DOI [10.2202/1544-6115.1075, DOI 10.2202/1544-6115.1075]
[7]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[8]   Classification using partial least squares with penalized logistic regression [J].
Fort, G ;
Lambert-Lacroix, S .
BIOINFORMATICS, 2005, 21 (07) :1104-1111
[9]   INTERMEDIATE LEAST-SQUARES REGRESSION METHOD [J].
FRANK, IE .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1987, 1 (03) :233-242
[10]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537