Classification of arrayCGH data using fused SVM

被引:56
作者
Rapaport, Franck [1 ,2 ,3 ]
Barillot, Emmanuel [1 ,2 ,3 ]
Vert, Jean-Philippe [1 ,2 ,3 ]
机构
[1] Inst Curie, Ctr Rech, F-75248 Paris, France
[2] INSERM, U900, F-75248 Paris, France
[3] Ecole Mines, Ctr Computat Biol, F-77305 Fontainebleau, France
关键词
D O I
10.1093/bioinformatics/btn188
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of bacterial artificial chromosomes along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules. Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome.
引用
收藏
页码:I375 / I382
页数:8
相关论文
共 39 条
[1]  
[Anonymous], ADV NEURAL INFORM PR
[2]   Bladder cancer stage and outcome by array-based comparative genomic hybridization [J].
Blaveri, E ;
Brewer, JL ;
Roydasgupta, R ;
Fridlyand, J ;
DeVries, S ;
Koppie, T ;
Pejavar, S ;
Mehta, K ;
Carroll, P ;
Simko, JP ;
Waldman, FM .
CLINICAL CANCER RESEARCH, 2005, 11 (19) :7012-7022
[3]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[4]  
Bown N, 2001, MED PEDIATR ONCOL, V36, P14, DOI 10.1002/1096-911X(20010101)36:1<14::AID-MPO1005>3.3.CO
[5]  
2-7
[6]  
Boyd S., 2004, CONVEX OPTIMIZATION
[7]   Atomic decomposition by basis pursuit [J].
Chen, SSB ;
Donoho, DL ;
Saunders, MA .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) :33-61
[8]   Using array-comparative genomic hybridization to define molecular portraits of primary breast cancers [J].
Chin, S-F ;
Wang, Y. ;
Thorne, N. P. ;
Teschendorff, A. E. ;
Pinder, S. E. ;
Vias, M. ;
Naderi, A. ;
Roberts, I. ;
Barbosa-Morais, N. L. ;
Garcia, M. J. ;
Iyer, N. G. ;
Kranjac, T. ;
Robertson, J. F. R. ;
Aparicio, S. ;
Tavare, S. ;
Ellis, I. ;
Brenton, J. D. ;
Caldas, C. .
ONCOGENE, 2007, 26 (13) :1959-1970
[9]   KIF14 is a candidate oncogene in the 1q minimal region of genomic gain in multiple cancers [J].
Corson, TW ;
Huang, A ;
Tsao, MS ;
Gallie, BL .
ONCOGENE, 2005, 24 (30) :4741-4753
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411