Avoiding the pitfalls of gene set enrichment analysis with SetRank

被引:87
作者
Simillion, Cedric [1 ,2 ]
Liechti, Robin [3 ]
Lischer, Heidi E. L. [1 ,5 ]
Ioannidis, Vassilios [3 ,4 ]
Bruggmann, Remy [1 ]
机构
[1] Univ Bern, Swiss Inst Bioinformat, Interfac Bioinformat Unit, Baltzerstrasse 6, CH-3012 Bern, Switzerland
[2] Univ Bern, Dept Clin Res, Murtenstrasse 35, CH-3008 Bern, Switzerland
[3] SIB Swiss Inst Bioinformat, Vital IT, Quartier Sorge Batiment Genopode, CH-1015 Lausanne, Switzerland
[4] SIB Swiss Inst Bioinformat, SIB Technol, Quartier Sorge Batiment Genopode, CH-1015 Lausanne, Switzerland
[5] Univ Zurich, Inst Evolutionary Biol & Environm Studies IEU, URPP Evolut Action, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
关键词
GSEA; Gene set enrichment analysis; Pathway analysis; Sample source bias; Functional genomics; Algorithm; R package; Web interface; EXPRESSION; MICROARRAY; CATEGORIES; KNOWLEDGE; PATHWAYS; ONTOLOGY; TOOL;
D O I
10.1186/s12859-017-1571-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The purpose of gene set enrichment analysis (GSEA) is to find general trends in the huge lists of genes or proteins generated by many functional genomics techniques and bioinformatics analyses. Results: Here we present SetRank, an advanced GSEA algorithm which is able to eliminate many false positive hits. The key principle of the algorithm is that it discards gene sets that have initially been flagged as significant, if their significance is only due to the overlap with another gene set. The algorithm is explained in detail and its performance is compared to that of other methods using objective benchmarking criteria. Furthermore, we explore how sample source bias can affect the results of a GSEA analysis. Conclusions: The benchmarking results show that SetRank is a highly specific tool for GSEA. Furthermore, we show that the reliability of results can be improved by taking sample source bias into account. SetRank is available as an R package and through an online web interface.
引用
收藏
页数:14
相关论文
共 39 条
[11]   Discovering motifs in ranked lists of DNA sequences [J].
Eden, Eran ;
Lipson, Doron ;
Yogev, Sivan ;
Yakhini, Zohar .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (03) :508-522
[12]   GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists [J].
Eden, Eran ;
Navon, Roy ;
Steinfeld, Israel ;
Lipson, Doron ;
Yakhini, Zohar .
BMC BIOINFORMATICS, 2009, 10
[13]  
Efron B., 2007, On testing the significance of sets of genes, P107, DOI DOI 10.1214/07-AOAS101
[14]   Using GOstats to test gene lists for GO term association [J].
Falcon, S. ;
Gentleman, R. .
BIOINFORMATICS, 2007, 23 (02) :257-258
[15]   A global test for groups of genes: testing association with a clinical outcome [J].
Goeman, JJ ;
van de Geer, SA ;
de Kort, F ;
van Houwelingen, HC .
BIOINFORMATICS, 2004, 20 (01) :93-99
[16]   GSVA: gene set variation analysis for microarray and RNA-Seq data [J].
Haenzelmann, Sonja ;
Castelo, Robert ;
Guinney, Justin .
BMC BIOINFORMATICS, 2013, 14
[17]  
HOLM S, 1979, SCAND J STAT, V6, P65
[18]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[19]   Extensions to gene set enrichment [J].
Jiang, Zhen ;
Gentleman, Robert .
BIOINFORMATICS, 2007, 23 (03) :306-313
[20]   Data, information, knowledge and principle: back to metabolism in KEGG [J].
Kanehisa, Minoru ;
Goto, Susumu ;
Sato, Yoko ;
Kawashima, Masayuki ;
Furumichi, Miho ;
Tanabe, Mao .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D199-D205