Can Survival Prediction Be Improved By Merging Gene Expression Data Sets?

被引:43
|
作者
Yasrebi, Haleh
Sperisen, Peter
Praz, Viviane
Bucher, Philipp
机构
[1] Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology (EPFL), School of Life Sciences, Lausanne
[2] Swiss Institute of Bioinformatics, EPFL SV ISREC, Lausanne
来源
PLOS ONE | 2009年 / 4卷 / 10期
关键词
BREAST-CANCER; MICROARRAY DATA; ESTROGEN-RECEPTOR; HISTOLOGIC GRADE; MARKER GENES; SIGNATURE; PLATFORM; CLASSIFICATION; CARCINOMAS; SUBTYPES;
D O I
10.1371/journal.pone.0007431
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. Results: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. Conclusions: Merging did not deteriorate performance on average despite (a) The diversity of microarray platforms used. (b) The heterogeneity of patients cohorts. (c) The heterogeneity of breast cancer disease. (d) Substantial variation of time to death or relapse. (e) The reduced number of genes in the merged data sets. Predictors derived from the merged data sets were more robust, consistent and reproducible across microarray platforms. Moreover, merging data sets from different studies helps to better understand the biases of individual studies and can lead to the identification of strong survival factors like CYB5D1 expression.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Integrative biomarker detection on high-dimensional gene expression data sets: a survey on prior knowledge approaches
    Perscheid, Cindy
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [42] Removing the association of random gene sets and survival time in cancers with positive random bias using fixed-point gene set
    Maghsoudi, Maryam
    Aghdam, Rosa
    Eslahchi, Changiz
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [43] Classification and survival prediction in early-stage cirrhosis by gene expression profiling
    Wang, Qingliang
    Li, Xiaojie
    Chen, Yaqiong
    Gong, Jiao
    Hu, Bo
    JOURNAL OF VIRAL HEPATITIS, 2023, 30 (02) : 116 - 128
  • [44] Improvement of Survival Prediction from Gene Expression Profiles by Mining of Prior Knowledge
    Ren, Siyuan
    Obradovic, Zoran
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 372 - 375
  • [45] Prediction of Disease-free Survival in Hepatocellular Carcinoma by Gene Expression Profiling
    Lim, Ho-Yeong
    Sohn, Insuk
    Deng, Shibing
    Lee, Jeeyun
    Jung, Sin Ho
    Mao, Mao
    Xu, Jiangchun
    Wang, Kai
    Shi, Stephanie
    Joh, Jae Won
    Choi, Yoon La
    Park, Cheol-Keun
    ANNALS OF SURGICAL ONCOLOGY, 2013, 20 (12) : 3747 - 3753
  • [46] Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction
    Zheng, Xingyu
    Amos, Christopher, I
    Frost, H. Robert
    BMC CANCER, 2021, 21 (01)
  • [47] Combining gene expression signature with clinical features for survival stratification of gastric cancer
    Sun, Qiang
    Guo, Dongyang
    Li, Shuang
    Xu, Yanjun
    Jiang, Mingchun
    Li, Yang
    Duan, Huilong
    Zhuo, Wei
    Liu, Wei
    Zhu, Shankuan
    Wang, Liangjing
    Zhou, Tianhua
    GENOMICS, 2021, 113 (04) : 2683 - 2694
  • [48] Cross-Platform Analysis with Binarized Gene Expression Data
    Tuna, Salih
    Niranjan, Mahesan
    PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2009, 5780 : 439 - 449
  • [49] Distance in cancer gene expression from stem cells predicts patient survival
    Riester, Markus
    Wu, Hua-Jun
    Zehir, Ahmet
    Gonen, Mithat
    Moreira, Andre L.
    Downey, Robert J.
    Michor, Franziska
    PLOS ONE, 2017, 12 (03):
  • [50] Breast Cancer Survival Prediction Modeling Based on Genomic Data: An Improved Prognosis-Driven Deep Learning Approach
    Mahmoud, Amena
    Alhussein, Musaed
    Aurangzeb, Khursheed
    Takaoka, Eiko
    IEEE ACCESS, 2024, 12 : 119502 - 119519