Gene set analysis methods: statistical models and methodological differences

被引:89
作者
Maciejewski, Henryk [1 ]
机构
[1] Wroclaw Univ Technol, Inst Comp Engn Control & Robot, PL-50370 Wroclaw, Poland
关键词
gene set analysis; high-throughput data; gene expression; GWAS; competitive methods; self-contained methods; ENRICHMENT ANALYSIS; TESTING ASSOCIATION; MICROARRAY DATA; EXPRESSION DATA; GLOBAL TEST; PATHWAYS;
D O I
10.1093/bib/bbt002
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many methods of gene set analysis developed in recent years have been compared empirically in a number of comprehensive review articles. Although it is recognized that different methods tend to identify different gene sets as significant, no consensus has been worked out as to which method is preferable, as the recommendations are often contradictory. In this article, we want to group and compare different methods in terms of the methodological assumptions pertaining to definition of a sample and formulation of the actual null hypothesis. We discuss four models of statistical experiment explicitly or implicitly assumed by most if not all currently available methods of gene set analysis. We analyse validity of the models in the context of the actual biological experiment. Based on this, we recommend a group of methods that provide biologically interpretable results in statistically sound way. Finally, we demonstrate how correlated or low signal-to-noise data affects performance of different methods, observed in terms of the false-positive rate and power.
引用
收藏
页码:504 / 518
页数:15
相关论文
共 27 条
  • [1] A general modular framework for gene set enrichment analysis
    Ackermann, Marit
    Strimmer, Korbinian
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [2] Significance analysis of functional categories in gene expression studies: a structured permutation approach
    Barry, WT
    Nobel, AB
    Wright, FA
    [J]. BIOINFORMATICS, 2005, 21 (09) : 1943 - 1949
  • [3] Improving gene set analysis of microarray data by SAM-GS
    Dinu, Irina
    Potter, John D.
    Mueller, Thomas
    Liu, Qi
    Adewale, Adeniyi J.
    Jhangri, Gian S.
    Einecke, Gunilla
    Famulski, Konrad S.
    Halloran, Philip
    Yasui, Yutaka
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [4] Dinu I, 2008, CANCER INFORM, V6, P357
  • [5] ON TESTING THE SIGNIFICANCE OF SETS OF GENES
    Efron, Bradley
    Tibshirani, Robert
    [J]. ANNALS OF APPLIED STATISTICS, 2007, 1 (01) : 107 - 129
  • [6] Outcome signature genes in breast cancer: is there a unique set?
    Ein-Dor, L
    Kela, I
    Getz, G
    Givol, D
    Domany, E
    [J]. BIOINFORMATICS, 2005, 21 (02) : 171 - 178
  • [7] Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer
    Ein-Dor, L
    Zuk, O
    Domany, E
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) : 5923 - 5928
  • [8] Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods
    Fridley, Brooke L.
    Jenkins, Gregory D.
    Biernacka, Joanna M.
    [J]. PLOS ONE, 2010, 5 (09): : 1 - 9
  • [9] Analyzing gene expression data in terms of gene sets:: methodological issues
    Goeman, Jelle J.
    Buehlmann, Peter
    [J]. BIOINFORMATICS, 2007, 23 (08) : 980 - 987
  • [10] Testing association of a pathway with survival using gene expression data
    Goeman, JJ
    Oosting, J
    Cleton-Jansen, AM
    Anninga, JK
    van Houwelingen, HC
    [J]. BIOINFORMATICS, 2005, 21 (09) : 1950 - 1957