Semi-supervised learning improves gene expression-based prediction of cancer recurrence

被引:65
作者
Shi, Mingguang [1 ]
Zhang, Bing [1 ,2 ]
机构
[1] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37232 USA
[2] Vanderbilt Univ, Sch Med, Dept Canc Biol, Nashville, TN 37232 USA
基金
美国国家卫生研究院;
关键词
COLORECTAL-CANCER; SIGNATURE; SURVIVAL; CLASSIFICATION; MICROARRAY; OUTCOMES;
D O I
10.1093/bioinformatics/btr502
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression profiling has shown great potential in outcome prediction for different types of cancers. Nevertheless, small sample size remains a bottleneck in obtaining robust and accurate classifiers. Traditional supervised learning techniques can only work with labeled data. Consequently, a large number of microarray data that do not have sufficient follow-up information are disregarded. To fully leverage all of the precious data in public databases, we turned to a semi-supervised learning technique, low density separation (LDS). Results: Using a clinically important question of predicting recurrence risk in colorectal cancer patients, we demonstrated that (i) semi-supervised classification improved prediction accuracy as compared with the state of the art supervised method SVM, (ii) performance gain increased with the number of unlabeled samples, (iii) unlabeled data from different institutes could be employed after appropriate processing and (iv) the LDS method is robust with regard to the number of input features. To test the general applicability of this semi-supervised method, we further applied LDS on human breast cancer datasets and also observed superior performance. Our results demonstrated great potential of semi-supervised learning in gene expression-based outcome prediction for cancer patients.
引用
收藏
页码:3017 / 3023
页数:7
相关论文
共 43 条
[1]  
[Anonymous], 2006, BOOK REV IEEE T NEUR
[2]   Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[3]   Semi-supervised learning on Riemannian manifolds [J].
Belkin, M ;
Niyogi, P .
MACHINE LEARNING, 2004, 56 (1-3) :209-239
[4]   Oncogenic pathway signatures in human cancers as a guide to targeted therapies [J].
Bild, AH ;
Yao, G ;
Chang, JT ;
Wang, QL ;
Potti, A ;
Chasse, D ;
Joshi, MB ;
Harpole, D ;
Lancaster, JM ;
Berchuck, A ;
Olson, JA ;
Marks, JR ;
Dressman, HK ;
West, M ;
Nevins, JR .
NATURE, 2006, 439 (7074) :353-357
[5]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[6]  
Chapelle O., 2005, P 10 INT WORKSH ART, P57
[7]  
Chapelle O, 2008, J MACH LEARN RES, V9, P203
[8]   Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions [J].
Chen, Ke ;
Wang, Shihai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) :129-143
[9]   Validated prediction of clinical outcome in sarcomas and multiple types of cancer on the basis of a gene expression signature related to genome complexity [J].
Chibon, Frederic ;
Lagarde, Pauline ;
Salas, Sebastien ;
Perot, Gaelle ;
Brouste, Veronique ;
Tirode, Franck ;
Lucchesi, Carlo ;
de Reynies, Aurelien ;
Kauffmann, Audrey ;
Bui, Binh ;
Terrier, Philippe ;
Bonvalot, Sylvie ;
Le Cesne, Axel ;
Vince-Ranchere, Dominique ;
Blay, Jean-Yves ;
Collin, Francoise ;
Guillou, Louis ;
Leroux, Agnes ;
Coindre, Jean-Michel ;
Aurias, Alain .
NATURE MEDICINE, 2010, 16 (07) :781-U81
[10]  
Cozman FG, 2002, P 15 INT FLOR ART IN, P327