Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses

被引:117
|
作者
Singh-Blom, U. Martin [1 ,2 ]
Natarajan, Nagarajan [3 ]
Tewari, Ambuj [4 ]
Woods, John O. [1 ]
Dhillon, Inderjit S. [3 ]
Marcotte, Edward M. [1 ,5 ]
机构
[1] Univ Texas Austin, Ctr Syst & Synthet Biol, Inst Cellular & Mol Biol, Austin, TX 78712 USA
[2] Karolinska Inst, Dept Med, Stockholm, Sweden
[3] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[4] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[5] Univ Texas Austin, Dept Chem & Biochem, Austin, TX 78712 USA
来源
PLOS ONE | 2013年 / 8卷 / 05期
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
GENOME; DATABASE; PRIORITIZATION; IDENTIFICATION; INTEGRATION; PHENOTYPE; RESOURCE; BIOLOGY; WALKING; MODELS;
D O I
10.1371/journal.pone.0058977
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called CATAPULT (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas CATAPULT is better suited to correctly identifying gene-trait associations overall. The authors want to thank Jon Laurent and Kris McGary for some of the data used, and Li and Patra for making their code available. Most of Ambuj Tewari's contribution to this work happened while he was a postdoctoral fellow at the University of Texas at Austin.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Quantitative assessment of gene expression network module-validation methods
    Li, Bing
    Zhang, Yingying
    Yu, Yanan
    Wang, Pengqian
    Wang, Yongcheng
    Wang, Zhong
    Wang, Yongyan
    SCIENTIFIC REPORTS, 2015, 5
  • [42] Prediction and Validation of Disease Genes Using HeteSim Scores
    Zeng, Xiangxiang
    Liao, Yuanlu
    Liu, Yuansheng
    Zou, Quan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (03) : 687 - 695
  • [43] Benchmarking network propagation methods for disease gene identification
    Picart-Armada, Sergio
    Barrett, Steven J.
    Wille, David R.
    Perera-Lluna, Alexandre
    Gutteridge, Alex
    Dessailly, Benoit H.
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (09)
  • [44] Variant Frequency and Clinical Phenotype Call Into Question the Nature of Minor, Nonsyndromic Long-QT Syndrome-Susceptibility Gene-Disease Associations
    Giudicessi, John R.
    Rohatgi, Ram K.
    Tester, David J.
    Ackerman, Michael J.
    CIRCULATION, 2020, 141 (06) : 495 - 497
  • [45] Connecting biological themes using a single human network of gene associations
    Fang, Hai
    Zhang, Ji
    Wang, Kan-Kan
    2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 463 - 471
  • [46] Multimodal network diffusion predicts future disease-gene-chemical associations
    Lin, Chih-Hsu
    Konecki, Daniel M.
    Liu, Meng
    Wilson, Stephen J.
    Nassar, Huda
    Wilkins, Angela D.
    Gleich, David F.
    Lichtarge, Olivier
    BIOINFORMATICS, 2019, 35 (09) : 1536 - 1543
  • [47] Prediction of biomarker-disease associations based on graph attention network and text representation
    Yang, Minghao
    Huang, Zhi-An
    Gu, Wenhao
    Han, Kun
    Pan, Wenying
    Yang, Xiao
    Zhu, Zexuan
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [48] Adversarial regularized autoencoder graph neural network for microbe-disease associations prediction
    He, Limuxuan
    Zou, Quan
    Dai, Qi
    Cheng, Shuang
    Wang, Yansu
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [49] iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction
    Yuan, Lin
    Zhao, Jiawang
    Shen, Zhen
    Zhang, Qinhu
    Geng, Yushui
    Zheng, Chun-Hou
    Huang, De-Shuang
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (08)
  • [50] Disease-gene prediction based on preserving structure network embedding
    Ma, Jinlong
    Qin, Tian
    Xiang, Ju
    FRONTIERS IN AGING NEUROSCIENCE, 2023, 15