MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis

被引:31
作者
Kim, SungHwan [1 ,2 ]
Lin, Chien-Wei [1 ]
Tseng, George. C. [1 ,3 ,4 ]
机构
[1] Univ Pittsburgh, Dept Biostat, Pittsburgh, PA 15261 USA
[2] Korea Univ, Dept Stat, Seoul, South Korea
[3] Univ Pittsburgh, Dept Computat & Syst Biol, Pittsburgh, PA 15260 USA
[4] Univ Pittsburgh, Dept Human Genet, Pittsburgh, PA 15260 USA
基金
新加坡国家研究基金会;
关键词
INTER-PLATFORM COMPARABILITY; CANDIDATE TUMOR-SUPPRESSOR; BREAST-CANCER PATIENTS; GENE-EXPRESSION; MICROARRAY DATA; ADJUSTMENT; BIOMARKER; PATTERNS; INDEX; RISK;
D O I
10.1093/bioinformatics/btw115
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies. Results: We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients.
引用
收藏
页码:1966 / 1973
页数:8
相关论文
共 53 条
[1]   RANK DISCRIMINANTS FOR PREDICTING PHENOTYPES FROM RNA EXPRESSION [J].
Afsari, Bahman ;
Braga-Neto, Ulisses M. ;
Geman, Donald .
ANNALS OF APPLIED STATISTICS, 2014, 8 (03) :1469-1491
[2]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[3]   Cross-study validation for the assessment of prediction algorithms [J].
Bernau, Christoph ;
Riester, Markus ;
Boulesteix, Anne-Laure ;
Parmigiani, Giovanni ;
Huttenhower, Curtis ;
Waldron, Levi ;
Trippa, Lorenzo .
BIOINFORMATICS, 2014, 30 (12) :105-112
[4]   Ratio adjustment and calibration scheme for gene-wise normalization to enhance microarray inter-study prediction [J].
Cheng, Chunrong ;
Shen, Kui ;
Song, Chi ;
Luo, Jianhua ;
Tseng, George C. .
BIOINFORMATICS, 2009, 25 (13) :1655-1661
[5]   Hormone Receptor and ERBB2 Status in Gene Expression Profiles of Human Breast Tumor Samples [J].
Dvorkin-Gheva, Anna ;
Hassell, John A. .
PLOS ONE, 2011, 6 (10)
[6]  
Fisher FMaRA, 1948, AM STAT, V2, P30, DOI DOI 10.2307/2681650
[7]  
Fisher R.A., 1925, STAT METHODS RES WOR
[8]   Cross-study validation and combined analysis of gene expression microarray data [J].
Garrett-Mayer, Elizabeth ;
Parmigiani, Giovanni ;
Zhong, Xiaogang ;
Cope, Leslie ;
Gabrielson, Edward .
BIOSTATISTICS, 2008, 9 (02) :333-354
[9]  
Geman Donald, 2004, Stat Appl Genet Mol Biol, V3, pArticle19
[10]   Microarray-based Gene Expression Profiling as a Clinical Tool for Breast Cancer Management: Are We There Yet? [J].
Geyer, Felipe Correa ;
Reis-Filho, Jorge Sergio .
INTERNATIONAL JOURNAL OF SURGICAL PATHOLOGY, 2009, 17 (04) :285-302