Down-weighting overlapping genes improves gene set analysis

被引:109
作者
Tarca, Adi Laurentiu [1 ,2 ,3 ,4 ]
Draghici, Sorin [3 ,5 ]
Bhatti, Gaurav [1 ,2 ]
Romero, Roberto [1 ,2 ]
机构
[1] NICHD, NIH, DHHS, Perinatol Res Branch, Bethesda, MD USA
[2] NICHD, NIH, DHHS, Perinatol Res Branch, Detroit, MI USA
[3] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[4] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI 48202 USA
[5] Wayne State Univ, Dept Clin & Translat Sci, Detroit, MI USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家科学基金会;
关键词
Gene expression; Gene set analysis; Pathway analysis; Overlapping gene sets; MICROARRAY DATA; SIGNALING PATHWAYS; EXPRESSION; DISEASE; CANCER; IDENTIFICATION; NORMALIZATION; REGIONS; BIOLOGY;
D O I
10.1186/1471-2105-13-136
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results: In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions: PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/ or www.bioconductor.org.
引用
收藏
页数:14
相关论文
共 47 条
  • [1] A general modular framework for gene set enrichment analysis
    Ackermann, Marit
    Strimmer, Korbinian
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [2] [Anonymous], 2011, R: A Language and Environment for Statistical Computing
  • [3] Badea L, 2008, HEPATO-GASTROENTEROL, V55, P2016
  • [4] Identification of a common gene expression signature in dilated cardiomyopathy across independent microarray studies
    Barth, Andreas S.
    Kuner, Ruprecht
    Buness, Andreas
    Ruschhaupt, Markus
    Merk, Sylvia
    Zwermann, Ludwig
    Kaeaeb, Stefan
    Kreuzer, Eckart
    Steinbeck, Gerhard
    Mansmann, Ulrich
    Poustka, Annemarie
    Nabauer, Michael
    Sueltmann, Holger
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2006, 48 (08) : 1610 - 1617
  • [5] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [6] Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses
    Blalock, EM
    Geddes, JW
    Chen, KC
    Porter, NM
    Markesbery, WR
    Landfield, PW
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (07) : 2173 - 2178
  • [7] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [8] Carlson M, 2010, KEGG DB SET ANNOTATI
  • [9] Improving gene set analysis of microarray data by SAM-GS
    Dinu, Irina
    Potter, John D.
    Mueller, Thomas
    Liu, Qi
    Adewale, Adeniyi J.
    Jhangri, Gian S.
    Einecke, Gunilla
    Famulski, Konrad S.
    Halloran, Philip
    Yasui, Yutaka
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [10] Global functional profiling of gene expression
    Draghici, S
    Khatri, P
    Martins, RP
    Ostermeier, GC
    Krawetz, SA
    [J]. GENOMICS, 2003, 81 (02) : 98 - 104