Convergent random forest predictor: Methodology for predicting drug response from genome-scale data applied to anti-TNF response

被引:39
作者
Bienkowska, Jadwiga R. [1 ]
Dalgin, Gul S.
Batliwalla, Franak [2 ]
Allaire, Normand
Roubenoff, Ronenn
Gregersen, Peter K. [2 ]
Carulli, John P.
机构
[1] Biogen Idec Inc, Mol Profiling, Cambridge, MA 02142 USA
[2] N Shore LIJ Hlth Syst, Feinstein Inst Med Res, Manhasset, NY USA
关键词
TNF-block therapy; Drug response prediction; Classifiers; CANCER; CLASSIFICATION; VALIDATION; GENES;
D O I
10.1016/j.ygeno.2009.08.008
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Biomarker development for prediction of patient response to therapy is one of the goals of molecular profiling of human tissues. Due to the large number of transcripts, relatively limited number of samples, and high variability of data, identification of predictive biomarkers is a challenge for data analysis. Furthermore, many genes may be responsible for drug response differences, but often only a few are sufficient for accurate prediction. Here we present an analysis approach, the Convergent Random Forest (CRF) method, for the identification of highly predictive biomarkers. The aim is to select from genome-wide expression data a small number of non-redundant biomarkers that Could be developed into a simple and robust diagnostic tool. Our method combines the Random Forest classifier and gene expression clustering to rank and select a small number of predictive genes. We evaluated the CRF approach by analyzing four different data sets. The first set contains transcript profiles of whole blood from rheumatoid arthritis patients, collected before anti-TNF treatment, and their subsequent response to the therapy. In this set, CRF identified 8 transcripts predicting response to therapy with 89% accuracy. We also applied the CRF to the analysis of three previously published expression data sets. For all sets, we have compared the CRF and recursive support vector machines (RSVM) approaches to feature selection and classification. In all cases the CRF selects much smaller number of features, five to eight genes, while achieving similar or better performance on both training and independent testing sets of data. For both methods performance estimates using cross-validation is similar to performance on independent samples. The method has been implemented in R and is available from the authors upon request: Jadwiga.Bienkowska@biogenidec.com. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:423 / 432
页数:10
相关论文
共 36 条
  • [1] Experimental comparison and cross-validation of Affymetrix HT plate and cartridge array gene expression platforms
    Allaire, Normand E.
    Rieder, Leila E.
    Bienkowska, Jadwiga
    Carulli, John P.
    [J]. GENOMICS, 2008, 92 (05) : 359 - 365
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Histone deacetylase activities are required for innate immune cell control of Th1 but not Th2 effector cell function
    Brogdon, Jennifer L.
    Xu, Yongyao
    Szabo, Susanne J.
    An, Shaojian
    Buxton, Francis
    Cohen, Dalia
    Huang, Qian
    [J]. BLOOD, 2007, 109 (03) : 1123 - 1130
  • [4] Evaluation of DNA microarray results with quantitative gene expression platforms
    Canales, Roger D.
    Luo, Yuling
    Willey, James C.
    Austermiller, Bradley
    Barbacioru, Catalin C.
    Boysen, Cecilie
    Hunkapiller, Kathryn
    Jensen, Roderick V.
    Knight, Charles R.
    Lee, Kathleen Y.
    Ma, Yunqing
    Maqsodi, Botoul
    Papallo, Adam
    Peters, Elizabeth Herness
    Poulter, Karen
    Ruppel, Patricia L.
    Samaha, Raymond R.
    Shi, Leming
    Yang, Wen
    Zhang, Lu
    Goodsaid, Federico M.
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (09) : 1115 - 1122
  • [5] Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
  • [6] Gene selection and classification of microarray data using random forest -: art. no. 3
    Díaz-Uriarte, R
    de Andrés, SA
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [7] Histone deacetylase inhibitors induce antigen specific anergy in lymphocytes: A comparative study
    Edens, R. Erik
    Dagtas, Selma
    Gilbert, Kathleen M.
    [J]. INTERNATIONAL IMMUNOPHARMACOLOGY, 2006, 6 (11) : 1673 - 1681
  • [8] Outcome signature genes in breast cancer: is there a unique set?
    Ein-Dor, L
    Kela, I
    Getz, G
    Givol, D
    Domany, E
    [J]. BIOINFORMATICS, 2005, 21 (02) : 171 - 178
  • [9] Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals
    Enot, David P.
    Beckmann, Manfred
    Overy, David
    Draper, John
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (40) : 14865 - 14870
  • [10] Concordance among gene-expression-based predictors for breast cancer
    Fan, Cheng
    Oh, Daniel S.
    Wessels, Lodewyk
    Weigelt, Britta
    Nuyten, Dimitry S. A.
    Nobel, Andrew B.
    van't Veer, Laura J.
    Perou, Charles M.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2006, 355 (06) : 560 - 569