GO2MSIG, an automated GO based multi-species gene set generator for gene set enrichment analysis

被引:30
作者
Powell, Justin Andrew Christiaan [1 ]
机构
[1] Takeda Cambridge Ltd, Cambridge CB4 0PZ, England
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
Gene set enrichment analysis (GSEA); GO ontology; Gene set collection; ErmineJ; ONTOLOGY; TOOL; PATHWAY;
D O I
10.1186/1471-2105-15-146
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Despite the widespread use of high throughput expression platforms and the availability of a desktop implementation of Gene Set Enrichment Analysis (GSEA) that enables non-experts to perform gene set based analyses, the availability of the necessary precompiled gene sets is rare for species other than human. Results: A software tool (GO2MSIG) was implemented that combines data from various publicly available sources and uses the Gene Ontology (GO) project term relationships to produce GSEA compatible hierarchical GO based gene sets for all species for which association data is available. Annotation sources include the GO association database (which contains data for over 200000 species), the Entrez gene2go table, and various manufacturers' array annotation files. This enables the creation of gene sets from the most up-to-date annotation data available. Additional features include the ability to restrict by evidence code, to remap gene descriptors, to filter by set size and to speed up repeat queries by caching the GO term hierarchy. Synonymous GO terms are remapped to the version preferred by the GO ontology supplied. The tool can be used in standalone form, or via a web interface. Prebuilt gene set collections constructed from the September 2013 GO release are also available for common species including human. In contrast human GO based sets available from the Broad Institute itself date from 2008. Conclusions: GO2MSIG enables the bioinformatician and non-bioinformatician alike to generate gene sets required for GSEA analysis for almost any organism for which GO term association data exists. The output gene sets may be used directly within GSEA and do not require knowledge of programming languages such as Perl, R or Python. The output sets can also be used with other analysis software such as ErmineJ that accept gene sets in the same format.
引用
收藏
页数:6
相关论文
共 14 条
  • [1] FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes
    Al-Shahrour, F
    Díaz-Uriarte, R
    Dopazo, J
    [J]. BIOINFORMATICS, 2004, 20 (04) : 578 - 580
  • [2] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [3] GOstat: find statistically overrepresented Gene Ontologies within a group of genes
    Beissbarth, T
    Speed, TP
    [J]. BIOINFORMATICS, 2004, 20 (09) : 1464 - 1465
  • [4] DAVID: Database for annotation, visualization, and integrated discovery
    Dennis, G
    Sherman, BT
    Hosack, DA
    Yang, J
    Gao, W
    Lane, HC
    Lempicki, RA
    [J]. GENOME BIOLOGY, 2003, 4 (09)
  • [5] Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets
    Gatti, Daniel M.
    Barry, William T.
    Nobel, Andrew B.
    Rusyn, Ivan
    Wright, Fred A.
    [J]. BMC GENOMICS, 2010, 11
  • [6] PAGE: Parametric analysis of gene set enrichment
    Kim, SY
    Volsky, DJ
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [7] GOParGenPy: a high throughput method to generate Gene Ontology data matrices
    Kumar, Ajay Anand
    Holm, Liisa
    Toronen, Petri
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [8] ErmineJ: Tool for functional analysis of gene expression data sets
    Lee, HK
    Braynen, W
    Keshav, K
    Pavlidis, P
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [9] GAGE: generally applicable gene set enrichment for pathway analysis
    Luo, Weijun
    Friedman, Michael S.
    Shedden, Kerby
    Hankenson, Kurt D.
    Woolf, Peter J.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [10] Identification of differential gene pathways with principal component analysis
    Ma, Shuangge
    Kosorok, Michael R.
    [J]. BIOINFORMATICS, 2009, 25 (07) : 882 - 889