Avoiding the pitfalls of gene set enrichment analysis with SetRank

被引:87
作者
Simillion, Cedric [1 ,2 ]
Liechti, Robin [3 ]
Lischer, Heidi E. L. [1 ,5 ]
Ioannidis, Vassilios [3 ,4 ]
Bruggmann, Remy [1 ]
机构
[1] Univ Bern, Swiss Inst Bioinformat, Interfac Bioinformat Unit, Baltzerstrasse 6, CH-3012 Bern, Switzerland
[2] Univ Bern, Dept Clin Res, Murtenstrasse 35, CH-3008 Bern, Switzerland
[3] SIB Swiss Inst Bioinformat, Vital IT, Quartier Sorge Batiment Genopode, CH-1015 Lausanne, Switzerland
[4] SIB Swiss Inst Bioinformat, SIB Technol, Quartier Sorge Batiment Genopode, CH-1015 Lausanne, Switzerland
[5] Univ Zurich, Inst Evolutionary Biol & Environm Studies IEU, URPP Evolut Action, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
GSEA; Gene set enrichment analysis; Pathway analysis; Sample source bias; Functional genomics; Algorithm; R package; Web interface; EXPRESSION; MICROARRAY; CATEGORIES; KNOWLEDGE; PATHWAYS; ONTOLOGY; TOOL;
D O I
10.1186/s12859-017-1571-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The purpose of gene set enrichment analysis (GSEA) is to find general trends in the huge lists of genes or proteins generated by many functional genomics techniques and bioinformatics analyses. Results: Here we present SetRank, an advanced GSEA algorithm which is able to eliminate many false positive hits. The key principle of the algorithm is that it discards gene sets that have initially been flagged as significant, if their significance is only due to the overlap with another gene set. The algorithm is explained in detail and its performance is compared to that of other methods using objective benchmarking criteria. Furthermore, we explore how sample source bias can affect the results of a GSEA analysis. Conclusions: The benchmarking results show that SetRank is a highly specific tool for GSEA. Furthermore, we show that the reliability of results can be improved by taking sample source bias into account. SetRank is available as an R package and through an online web interface.
引用
收藏
页数:14
相关论文
共 39 条
[1]   Improved scoring of functional groups from gene expression data by decorrelating GO graph structure [J].
Alexa, Adrian ;
Rahnenfuehrer, Joerg ;
Lengauer, Thomas .
BIOINFORMATICS, 2006, 22 (13) :1600-1607
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 [J].
Barbie, David A. ;
Tamayo, Pablo ;
Boehm, Jesse S. ;
Kim, So Young ;
Moody, Susan E. ;
Dunn, Ian F. ;
Schinzel, Anna C. ;
Sandy, Peter ;
Meylan, Etienne ;
Scholl, Claudia ;
Froehling, Stefan ;
Chan, Edmond M. ;
Sos, Martin L. ;
Michel, Kathrin ;
Mermel, Craig ;
Silver, Serena J. ;
Weir, Barbara A. ;
Reiling, Jan H. ;
Sheng, Qing ;
Gupta, Piyush B. ;
Wadlow, Raymond C. ;
Le, Hanh ;
Hoersch, Sebastian ;
Wittner, Ben S. ;
Ramaswamy, Sridhar ;
Livingston, David M. ;
Sabatini, David M. ;
Meyerson, Matthew ;
Thomas, Roman K. ;
Lander, Eric S. ;
Mesirov, Jill P. ;
Root, David E. ;
Gilliland, D. Gary ;
Jacks, Tyler ;
Hahn, William C. .
NATURE, 2009, 462 (7269) :108-U122
[4]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[5]   Significance analysis of functional categories in gene expression studies: a structured permutation approach [J].
Barry, WT ;
Nobel, AB ;
Wright, FA .
BIOINFORMATICS, 2005, 21 (09) :1943-1949
[6]  
Bastian F., 2008, BGEE INTEGRATING COM, P124
[7]   GOing Bayesian: model-based gene set analysis of genome-scale data [J].
Bauer, Sebastian ;
Gagneur, Julien ;
Robinson, Peter N. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (11) :3523-3532
[8]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[9]  
Croft D, 2014, NUCLEIC ACIDS RES, V42, pD472, DOI [10.1093/nar/gkt1102, 10.1093/nar/gkz1031]
[10]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9