Classification using partial least squares with penalized logistic regression

被引:121
作者
Fort, G [1 ]
Lambert-Lacroix, S [1 ]
机构
[1] CNRS, LMC IMAG, F-38041 Grenoble, France
关键词
D O I
10.1093/bioinformatics/bti114
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: One important aspect of data-mining of microarray data is to discover the molecular variation among cancers. In microarray studies, the number n of samples is relatively small compared to the number p of genes per sample (usually in thousands). It is known that standard statistical methods in classification are efficient (i.e. in the present case, yield successful classifiers) particularly when n is (far) larger than p. This naturally calls for the use of a dimension reduction procedure together with the classification one. Results: In this paper, the question of classification in such a high-dimensional setting is addressed. We view the classification problem as a regression one with few observations and many predictor variables. We propose a new method combining partial least squares (PLS) and Ridge penalized logistic regression. We review the existing methods based on PLS and/or penalized likelihood techniques, outline their interest in some cases and theoretically explain their sometimes poor behavior. Our procedure is compared with these other classifiers. The predictive performance of the resulting classification rule is illustrated on three data sets: Leukemia, Colon and Prostate.
引用
收藏
页码:1104 / 1111
页数:8
相关论文
共 27 条
[1]  
ALBERT A, 1984, BIOMETRIKA, V71, P1
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Effective dimension reduction methods for tumor classification using gene expression data [J].
Antoniadis, A ;
Lambert-Lacroix, S ;
Leblanc, F .
BIOINFORMATICS, 2003, 19 (05) :563-570
[4]  
DEVROYE L, 1996, THEORY PATTERN RECOG
[5]  
DING B, 2004, 5 BIOC PROJ WORK PAP
[6]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[7]   Classification of microarray data with penalized logistic regression [J].
Eilers, PHC ;
Boer, JM ;
van Ommen, GJ ;
van Houwelingen, HC .
MICROARRAYS: OPTICAL TECHNOLOGIES AND INFORMATICS, 2001, 4266 :187-198
[8]  
Fahrmeir L., 2001, SPRINGER SERIES STAT, V2nd
[9]   A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].
FRANK, IE ;
FRIEDMAN, JH .
TECHNOMETRICS, 1993, 35 (02) :109-135
[10]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914