Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline

被引:102
作者
Chang, Lun-Ching [1 ]
Lin, Hui-Min [1 ]
Sibille, Etienne [2 ]
Tseng, George C. [1 ,3 ]
机构
[1] Univ Pittsburgh, Grad Sch Publ Hlth, Dept Biostat, Pittsburgh, PA 15261 USA
[2] Univ Pittsburgh, Sch Med, Dept Psychiat, Pittsburgh, PA USA
[3] Univ Pittsburgh, Grad Sch Publ Hlth, Dept Human Genet, Pittsburgh, PA 15261 USA
基金
美国国家卫生研究院;
关键词
MICROARRAY METAANALYSIS; QUALITY-CONTROL; PACKAGE; GENES;
D O I
10.1186/1471-2105-14-368
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. Results: We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HSA: DE genes with non-zero effect sizes in all studies, (2) HSB: DE genes with non-zero effect sizes in one or more studies and (3) HSr: DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. Conclusions: The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HSA, HSB, and HSr). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.
引用
收藏
页数:15
相关论文
共 29 条
[1]  
[Anonymous], 2010, Permutation Tests for Complex Data: Theory, Applications and Software, DOI 10.1002/9780470689516
[2]  
[Anonymous], 1925, STAT METHODS RES WOR
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   COMBINING INDEPENDENT TESTS OF SIGNIFICANCE [J].
BIRNBAUM, A .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1954, 49 (267) :559-574
[5]  
Borg I., 2005, Modern multidimensional scaling: theory and applications
[6]  
Breitling Rainer, 2005, Journal of Bioinformatics and Computational Biology, V3, P1171, DOI 10.1142/S0219720005001442
[7]   Comparison study of microarray meta-analysis methods [J].
Campain, Anna ;
Yang, Yee Hwa .
BMC BIOINFORMATICS, 2010, 11
[8]   Combining multiple microarray studies and modeling interstudy variation [J].
Choi, Jung Kyoon ;
Yu, Ungsik ;
Kim, Sangsoo ;
Yoo, Ook Joon .
BIOINFORMATICS, 2003, 19 :i84-i90
[9]  
COX DR, 1972, J R STAT SOC B, V34, P187
[10]   Meta-analysis of glioblastoma multiforme versus anaplastic astrocytoma identifies robust gene markers [J].
Dreyfuss, Jonathan M. ;
Johnson, Mark D. ;
Park, Peter J. .
MOLECULAR CANCER, 2009, 8 :71