A comparative study of topology-based pathway enrichment analysis methods

被引:51
作者
Ma, Jing [1 ,2 ]
Shojaie, Ali [3 ]
Michailidis, George [4 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77840 USA
[2] Fred Hutchinson Canc Res Ctr, Publ Hlth Sci Div, Seattle, WA 98107 USA
[3] Univ Washington, Dept Biostat, Seattle, WA 98105 USA
[4] Univ Florida, Dept Stat, Gainesville, FL 32611 USA
关键词
Pathway enrichment analysis; Pathway topology; Type I error; Power; Differential network biology; PERMUTATION TESTS; GENE SETS; EXPRESSION; KNOWLEDGE;
D O I
10.1186/s12859-019-3146-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Pathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the biomolecules under study together with their membership in pathways of interest. The latest generation of pathway enrichment methods also leverages information on the topology of the underlying pathways, which as evidence from their evaluation reveals, lead to improved sensitivity and specificity. Nevertheless, a systematic empirical comparison of such methods is still lacking, making selection of the most suitable method for a specific experimental setting challenging. This comparative study of nine network-based methods for pathway enrichment analysis aims to provide a systematic evaluation of their performance based on three real data sets with different number of features (genes/metabolites) and number of samples. Results The findings highlight both methodological and empirical differences across the nine methods. In particular, certain methods assess pathway enrichment due to differences both across expression levels and in the strength of the interconnectedness of the members of the pathway, while others only leverage differential expression levels. In the more challenging setting involving a metabolomics data set, the results show that methods that utilize both pieces of information (with NetGSA being a prototypical one) exhibit superior statistical power in detecting pathway enrichment. Conclusion The analysis reveals that a number of methods perform equally well when testing large size pathways, which is the case with genomic data. On the other hand, NetGSA that takes into consideration both differential expression of the biomolecules in the pathway, as well as changes in the topology exhibits a superior performance when testing small size pathways, which is usually the case for metabolomics data.
引用
收藏
页数:14
相关论文
共 52 条
  • [1] The Molecular Taxonomy of Primary Prostate Cancer
    Abeshouse, Adam
    Ahn, Jaeil
    Akbani, Rehan
    Ally, Adrian
    Amin, Samirkumar
    Andry, Christopher D.
    Annala, Matti
    Aprikian, Armen
    Armenia, Joshua
    Arora, Arshi
    Auman, J. Todd
    Balasundaram, Miruna
    Balu, Saianand
    Barbieri, Christopher E.
    Bauer, Thomas
    Benz, Christopher C.
    Bergeron, Alain
    Beroukhim, Rameen
    Berrios, Mario
    Bivol, Adrian
    Bodenheimer, Tom
    Boice, Lori
    Bootwalla, Moiz S.
    dos Reis, Rodolfo Borges
    Boutros, Paul C.
    Bowen, Jay
    Bowlby, Reanne
    Boyd, Jeffrey
    Bradley, Robert K.
    Breggia, Anne
    Brimo, Fadi
    Bristow, Christopher A.
    Brooks, Denise
    Broom, Bradley M.
    Bryce, Alan H.
    Bubley, Glenn
    Burks, Eric
    Butterfield, Yaron S. N.
    Button, Michael
    Canes, David
    Carlotti, Carlos G.
    Carlsen, Rebecca
    Carmel, Michel
    Carroll, Peter R.
    Carter, Scott L.
    Cartun, Richard
    Carver, Brett S.
    Chan, June M.
    Chang, Matthew T.
    Chen, Yu
    [J]. CELL, 2015, 163 (04) : 1011 - 1025
  • [2] ANDERSON T. W., 2003, INTRO MULTIVARIATE S, VSecond
  • [3] [Anonymous], CBMS REGIONAL C SERI
  • [4] Comparative study on gene set and pathway topology-based enrichment methods
    Bayerlova, Michaela
    Jung, Klaus
    Kramer, Frank
    Klemm, Florian
    Bleckmann, Annalen
    Beissbarth, Tim
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [5] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [6] Braun R, 2014, ARXIV14111993
  • [7] RHPN2 Drives Mesenchymal Transformation in Malignant Glioma by Triggering RhoA Activation
    Danussi, Carla
    Akavia, Uri David
    Niola, Francesco
    Jovic, Andreja
    Lasorella, Anna
    Pe'er, Dana
    Iavarone, Antonio
    [J]. CANCER RESEARCH, 2013, 73 (16) : 5140 - 5150
  • [8] A systems biology approach for pathway level analysis
    Draghici, Sorin
    Khatri, Purvesh
    Tarca, Adi Laurentiu
    Amin, Kashyap
    Done, Arina
    Voichita, Calin
    Georgescu, Constantin
    Romero, Roberto
    [J]. GENOME RESEARCH, 2007, 17 (10) : 1537 - 1545
  • [9] PathNet: a tool for pathway analysis using topological information
    Dutta, Bhaskar
    Wallqvist, Anders
    Reifman, Jaques
    [J]. SOURCE CODE FOR BIOLOGY AND MEDICINE, 2012, 7 (01):
  • [10] ON TESTING THE SIGNIFICANCE OF SETS OF GENES
    Efron, Bradley
    Tibshirani, Robert
    [J]. ANNALS OF APPLIED STATISTICS, 2007, 1 (01) : 107 - 129