ST-PLS:: a multi-directional nearest shrunken centroid type classi er via PLS

被引:42
作者
Saebo, S. [1 ]
Almoy, T. [1 ]
Aaroe, J. [2 ]
Aastveit, A. H. [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, N-1432 As, Norway
[2] Norwegian Radium Hosp, Dept Genet, N-0310 Oslo, Norway
关键词
classi cation; gene expression; soft-thresholding; variable selection;
D O I
10.1002/cem.1101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The nearest shrunken centroid (NSC) Classier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classi er is minimized by the so-called soft-thresholding property of the approach. In this paper, we rst show that for the two-class prediction problem, the NSC Classi er is similar to a one-component discriminant partial least squares (PLS) model with soft-shrinkage of the loading weights. Then we introduce the soft-threshold-PLS (ST-PLS) as a general discriminant-PLS model with soft-thresholding of the loading weights of multiple latent components. This method is especially suited for classi cation and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST-PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST-PLS and the NSC classi ers are applied to four real data sets. The results indicate that ST-PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classi cation, and there are strong dependencies between subsets of variables. Copyright (c) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:54 / 62
页数:9
相关论文
共 30 条
[21]   Tumor classification by partial least squares using microarray gene expression data [J].
Nguyen, DV ;
Rocke, DM .
BIOINFORMATICS, 2002, 18 (01) :39-50
[22]  
Ns T., 2002, USER FRIENDLY GUIDE
[23]  
RAYENS WS, 2003, J APPL STAT NAPOLI, V15, P367
[24]  
Ripley B.D., 1996, PATTERN RECOGN
[25]   QUALITATIVE INFORMATION FROM MULTIVARIATE CALIBRATION MODELS [J].
SEASHOLTZ, MB ;
KOWALSKI, BR .
APPLIED SPECTROSCOPY, 1990, 44 (08) :1337-1348
[26]   Early detection of breast cancer based on gene-expression patterns in peripheral blood cells [J].
Sharma, P ;
Sahni, NS ;
Tibshirani, R ;
Skaane, P ;
Urdal, P ;
Berghagen, H ;
Jensen, M ;
Kristiansen, L ;
Moen, C ;
Sharma, P ;
Zaka, A ;
Arnes, J ;
Sauer, T ;
Akslen, LA ;
Schlichting, E ;
Borresen-Dale, AL ;
Lönneborg, A .
BREAST CANCER RESEARCH, 2005, 7 (05) :R634-R644
[27]   Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data [J].
Simon, R .
BRITISH JOURNAL OF CANCER, 2003, 89 (09) :1599-1604
[28]   Gene expression correlates of clinical prostate cancer behavior [J].
Singh, D ;
Febbo, PG ;
Ross, K ;
Jackson, DG ;
Manola, J ;
Ladd, C ;
Tamayo, P ;
Renshaw, AA ;
D'Amico, AV ;
Richie, JP ;
Lander, ES ;
Loda, M ;
Kantoff, PW ;
Golub, TR ;
Sellers, WR .
CANCER CELL, 2002, 1 (02) :203-209
[29]   Diagnosis of multiple cancer types by shrunken centroids of gene expression [J].
Tibshirani, R ;
Hastie, T ;
Narasimhan, B ;
Chu, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6567-6572
[30]   Class prediction by nearest shrunken centroids, with applications to DNA microarrays [J].
Tibshirani, R ;
Hastie, T ;
Narasimhan, B ;
Chu, G .
STATISTICAL SCIENCE, 2003, 18 (01) :104-117