Disease gene identification by using graph kernels and Markov random fields

被引:32
作者
Chen BoLin [1 ]
Li Min [2 ]
Wang JianXin [2 ]
Wu FangXiang [1 ,3 ]
机构
[1] Univ Saskatchewan, Div Biomed Engn, Saskatoon, SK S7N 5A9, Canada
[2] Cent South Univ, Sch Informat Sci & Engn, Changsha 410083, Peoples R China
[3] Univ Saskatchewan, Dept Mech Engn, Saskatoon, SK S7N 5A9, Canada
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
disease gene identification; data integration; Markov random field; graph kernel; Bayesian analysis; PREDICTING PROTEIN FUNCTION; FUNCTIONAL MODULES; PRIORITIZING GENES; NETWORK; COMPLEXES; INTERACTOME; EXPRESSION; ALGORITHM; KNOWLEDGE; RESOURCE;
D O I
10.1007/s11427-014-4745-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genes associated with similar diseases are often functionally related. This principle is largely supported by many biological data sources, such as disease phenotype similarities, protein complexes, protein-protein interactions, pathways and gene expression profiles. Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases. To capture the gene-disease associations based on biological networks, a kernel-based Markov random field (MRF) method is proposed by combining graph kernels and the MRF method. In the proposed method, three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks, respectively, and a novel weighted MRF method is developed to integrate those data. In addition, an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method. Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm, achieving an AUC score of 0.771 when integrating all those biological data in our experiments, which indicates that our proposed method is very promising compared with many existing methods.
引用
收藏
页码:1054 / 1063
页数:10
相关论文
共 46 条
[1]  
[Anonymous], 2002, P 19 INT C MACH LEAR
[2]  
BESAG J, 1974, J ROY STAT SOC B MET, V36, P192
[3]  
Bolin Chen, 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), DOI 10.1109/BIBM.2013.6732576
[4]   Identifying disease genes by integrating multiple data sources [J].
Chen, Bolin ;
Wang, Jianxin ;
Li, Min ;
Wu, Fang-Xiang .
BMC MEDICAL GENOMICS, 2014, 7
[5]   In Silico Gene Prioritization by Integrating Multiple Data Sources [J].
Chen, Yixuan ;
Wang, Wenhui ;
Zhou, Yingyao ;
Shields, Robert ;
Chanda, Sumit K. ;
Elston, Robert C. ;
Li, Jing .
PLOS ONE, 2011, 6 (06)
[6]   An integrated probabilistic model for functional prediction of proteins [J].
Deng, MH ;
Chen, T ;
Sun, FZ .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (2-3) :463-475
[7]   Mapping gene ontology to proteins based on protein-protein interaction data [J].
Deng, MH ;
Tu, ZD ;
Sun, FZ ;
Chen, T .
BIOINFORMATICS, 2004, 20 (06) :895-902
[8]   Prediction of protein function using protein-protein interaction data [J].
Deng, MH ;
Zhang, K ;
Mehta, S ;
Chen, T ;
Sun, FZ .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) :947-960
[9]   The human disease network [J].
Goh, Kwang-Il ;
Cusick, Michael E. ;
Valle, David ;
Childs, Barton ;
Vidal, Marc ;
Barabasi, Albert-Laszlo .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (21) :8685-8690
[10]   Inferring disease and gene set associations with rank coherence in networks [J].
Hwang, TaeHyun ;
Zhang, Wei ;
Xie, Maoqiang ;
Liu, Jinfeng ;
Kuang, Rui .
BIOINFORMATICS, 2011, 27 (19) :2692-2699