Classification of array CGH data using smoothed logistic regression model

被引:8
作者
Huang, Jian [2 ]
Salim, Agus [3 ]
Lei, Kaibin
O'Sullivan, Kathleen [2 ]
Pawitan, Yudi [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, Stockholm, Sweden
[2] Natl Univ Ireland Univ Coll Cork, Stat Lab, Cork, Ireland
[3] Natl Univ Singapore, Dept Epidemiol & Publ Hlth, Singapore 117548, Singapore
关键词
high-throughput data; genomics; cancer; machine learning; cross-validation; SEGMENTATION; MICROARRAYS;
D O I
10.1002/sim.3753
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Array comparative genomic hybridization (aCGH) provides a genome-wide information of DNA copy number that is potentially useful for disease classification. One immediate problem is that the data contain many features (probes) but only a few samples. Existing approaches to overcome this problem include features selection, ridge regression and partial least squares. However, these methods typically ignore the spatial characteristic of aCGH data. To explicitly make use of this spatial information we develop a procedure called smoothed logistic regression (SLR) model. The procedure is based on a mixed logistic regression model, where the random component is a mixture distribution that controls smoothness and sparseness. Conceptually such a procedure is straightforward, but its implementation is complicated due to computational problems. We develop a fast and reliable iterative weighted least-squares algorithm based on the singular value decomposition. Simulated data and two real data sets are used to illustrate the procedure. For real data sets, error rates are calculated using the leave-one-out cross validation procedure. For both simulated and real data examples, SLR achieves better misclassification error rates compared with previous methods. Copyright (C) 2009 John Wiley & Sons, Ltd.
引用
收藏
页码:3798 / 3810
页数:13
相关论文
共 22 条
[1]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[2]   Classification of microarray data with penalized logistic regression [J].
Eilers, PHC ;
Boer, JM ;
van Ommen, GJ ;
van Houwelingen, HC .
MICROARRAYS: OPTICAL TECHNOLOGIES AND INFORMATICS, 2001, 4266 :187-198
[3]  
GOEMAN J, PENALIZED L1 LASSO L
[4]  
Hastie T, 2004, J MACH LEARN RES, V5, P1391
[5]   Robust smooth segmentation approach for array CGH data analysis [J].
Huang, Jian ;
Gusnanto, Arief ;
O'Sullivan, Kathleen ;
Staaf, Johan ;
Borg, AKe ;
Pawitan, Yudi .
BIOINFORMATICS, 2007, 23 (18) :2463-2469
[6]   Distinct genomic profiles in hereditary breast tumors identified by array-based comparative genomic hybridization [J].
Jönsson, G ;
Naylor, TL ;
Vallon-Christersson, J ;
Staaf, J ;
Huang, J ;
Ward, MR ;
Greshock, JD ;
Luts, L ;
Olsson, H ;
Rahman, N ;
Stratton, A ;
Ringnér, M ;
Borg, Å ;
Weber, BL .
CANCER RESEARCH, 2005, 65 (17) :7612-7621
[7]  
Lee Y., 2006, GEN LINEAR MODELS RA, DOI DOI 10.1201/9781315119953
[8]   Genetic instabilities in human cancers [J].
Lengauer, C ;
Kinzler, KW ;
Vogelstein, B .
NATURE, 1998, 396 (6712) :643-649
[9]   A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression [J].
Li, T ;
Zhang, CL ;
Ogihara, M .
BIOINFORMATICS, 2004, 20 (15) :2429-2437
[10]   Classification and feature selection algorithms for multi-class CGH data [J].
Liu, Jun ;
Ranka, Sanjay ;
Kahveci, Tamer .
BIOINFORMATICS, 2008, 24 (13) :I86-I95