Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies

被引:72
作者
Chakraborty, Sutirtha [1 ]
Datta, Somnath [1 ]
Datta, Susmita [1 ]
机构
[1] Univ Louisville, Dept Bioinformat & Biostat, Louisville, KY 40202 USA
关键词
PANCREATIC-CANCER; BRCA2; MUTATIONS; CD44; SUSCEPTIBILITY; CONFOUNDERS; LEUKEMIA; BREAST;
D O I
10.1093/bioinformatics/bts022
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard analysis of variance (ANOVA)/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, this technique becomes fundamentally flawed when there are unaccounted sources of variability in these arrays (latent variables attributable to different biological, environmental or other factors relevant in the context). These factors distort the true picture of differential gene expression between the two tissue types and introduce spurious signals of expression heterogeneity. As a result, many genes which are actually differentially expressed are not detected, whereas many others are falsely identified as positives. Moreover, these distortions can be different for different genes. Thus, it is also not possible to get rid of these variations by simple array normalizations. This both-way error can lead to a serious loss in sensitivity and specificity, thereby causing a severe inefficiency in the underlying multiple testing problem. In this work, we attempt to identify the hidden effects of the underlying latent factors in a gene expression profiling study by partial least squares (PLS) and apply ANCOVA technique with the PLS-identified signatures of these hidden effects as covariates, in order to identify the genes that are truly differentially expressed between the two concerned tissue types. Results: We compare the performance of our method SVA-PLS with standard ANOVA and a relatively recent technique of surrogate variable analysis (SVA), on a wide variety of simulation settings (incorporating different effects of the hidden variable, under situations with varying signal intensities and gene groupings). In all settings, our method yields the highest sensitivity while maintaining relatively reasonable values for the specificity, false discovery rate and false non-discovery rate. Application of our method to gene expression profiling for acute megakaryoblastic leukemia shows that our method detects an additional six genes, that are missed by both the standard ANOVA method as well as SVA, but may be relevant to this disease, as can be seen from mining the existing literature.
引用
收藏
页码:799 / 806
页数:8
相关论文
共 42 条
[1]   Re-expression of DNA methylation-silenced CD44 gene in a resistant NB4 cell line:: rescue of CD44-dependent cell death by cAMP [J].
Abecassis, I. ;
Maes, J. ;
Carrier, J-L ;
Hillion, J. ;
Goodhardt, M. ;
Medjber, K. ;
Wany, L. ;
Lanotte, M. ;
Karniguian, A. .
LEUKEMIA, 2008, 22 (03) :511-520
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]  
Akaike Hirotugu, 1998, Springer Series in Statistics, P309, DOI [10.1007/978-1-4612-1694-024, DOI 10.1007/978-1-4612-1694-024, 10.1007/978-1-4612-1694-0_24, DOI 10.1007/BF02888350]
[4]  
[Anonymous], 1985, Encyclopedia of Statistical Sciences
[5]  
[Anonymous], 2003, Sage, Thousand Oaks
[6]   Clinical significance of Gata-1, Gata-2, EKLF, and c-MPL expression in acute myeloid leukemia [J].
Ayala, Rosa M. ;
Martinez-Lopez, Joaquin ;
Albizua, Enriqueta ;
Diez, Amalia ;
Gilsanz, Florinda .
AMERICAN JOURNAL OF HEMATOLOGY, 2009, 84 (02) :79-86
[7]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[8]   PRELEUKEMIC ACUTE HUMAN LEUKEMIA [J].
BLOCK, M ;
JACOBSON, LO ;
BETHARD, WF .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1953, 152 (11) :1018-1028
[9]   Identification of distinct molecular phenotypes in acute megakaryoblastic leukemia by gene expression profiling [J].
Bourquin, JP ;
Subramanian, A ;
Langebrake, C ;
Reinhardte, D ;
Bernard, O ;
Ballerini, P ;
Baruchel, A ;
Cavé, H ;
Dastugue, N ;
Hasle, H ;
Kaspers, GL ;
Lessard, M ;
Michaux, L ;
Vyas, P ;
van Wering, E ;
Zwaan, CM ;
Golub, TR ;
Orkin, SH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (09) :3339-3344
[10]   Ligation of the CD44 adhesion molecule reverses blockage of differentiation in human acute myeloid leukemia [J].
Charrad, RS ;
Li, Y ;
Delpech, B ;
Balitrand, N ;
Clay, D ;
Jasmin, C ;
Chomienne, C ;
Smadja-Joffe, F .
NATURE MEDICINE, 1999, 5 (06) :669-676