Gene set analysis methods: statistical models and methodological differences

被引：89

作者：

Maciejewski, Henryk ^{[1
]}

机构：

[1] Wroclaw Univ Technol, Inst Comp Engn Control & Robot, PL-50370 Wroclaw, Poland

来源：

BRIEFINGS IN BIOINFORMATICS | 2014年 / 15卷 / 04期

关键词：

gene set analysis; high-throughput data; gene expression; GWAS; competitive methods; self-contained methods; ENRICHMENT ANALYSIS; TESTING ASSOCIATION; MICROARRAY DATA; EXPRESSION DATA; GLOBAL TEST; PATHWAYS;

D O I：

10.1093/bib/bbt002

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Many methods of gene set analysis developed in recent years have been compared empirically in a number of comprehensive review articles. Although it is recognized that different methods tend to identify different gene sets as significant, no consensus has been worked out as to which method is preferable, as the recommendations are often contradictory. In this article, we want to group and compare different methods in terms of the methodological assumptions pertaining to definition of a sample and formulation of the actual null hypothesis. We discuss four models of statistical experiment explicitly or implicitly assumed by most if not all currently available methods of gene set analysis. We analyse validity of the models in the context of the actual biological experiment. Based on this, we recommend a group of methods that provide biologically interpretable results in statistically sound way. Finally, we demonstrate how correlated or low signal-to-noise data affects performance of different methods, observed in terms of the false-positive rate and power.

引用

页码：504 / 518

页数：15

共 27 条

[1] A general modular framework for gene set enrichment analysis
Ackermann, Marit
Strimmer, Korbinian
[J]. BMC BIOINFORMATICS, 2009, 10
[2] Significance analysis of functional categories in gene expression studies: a structured permutation approach
Barry, WT
Nobel, AB
Wright, FA
[J]. BIOINFORMATICS, 2005, 21 (09) : 1943 - 1949
[3] Improving gene set analysis of microarray data by SAM-GS
Dinu, Irina
Potter, John D.
Mueller, Thomas
Liu, Qi
Adewale, Adeniyi J.
Jhangri, Gian S.
Einecke, Gunilla
Famulski, Konrad S.
Halloran, Philip
Yasui, Yutaka
[J]. BMC BIOINFORMATICS, 2007, 8 (1)
[4] Dinu I, 2008, CANCER INFORM, V6, P357
[5] ON TESTING THE SIGNIFICANCE OF SETS OF GENES
Efron, Bradley
Tibshirani, Robert
[J]. ANNALS OF APPLIED STATISTICS, 2007, 1 (01) : 107 - 129
[6] Outcome signature genes in breast cancer: is there a unique set?
Ein-Dor, L
Kela, I
Getz, G
Givol, D
Domany, E
[J]. BIOINFORMATICS, 2005, 21 (02) : 171 - 178
[7] Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer
Ein-Dor, L
Zuk, O
Domany, E
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) : 5923 - 5928
[8] Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods
Fridley, Brooke L.
Jenkins, Gregory D.
Biernacka, Joanna M.
[J]. PLOS ONE, 2010, 5 (09): : 1 - 9
[9] Analyzing gene expression data in terms of gene sets:: methodological issues
Goeman, Jelle J.
Buehlmann, Peter
[J]. BIOINFORMATICS, 2007, 23 (08) : 980 - 987
[10] Testing association of a pathway with survival using gene expression data
Goeman, JJ
Oosting, J
Cleton-Jansen, AM
Anninga, JK
van Houwelingen, HC
[J]. BIOINFORMATICS, 2005, 21 (09) : 1950 - 1957

← 1 2 3 →