Predicting lncRNA-disease associations using network topological similarity based on deep mining heterogeneous networks

被引:15
作者
Zhang Hui [1 ]
Liang Yanchun [1 ,2 ]
Peng Cheng [1 ]
Han Siyu [1 ]
Du Wei [1 ]
Li Ying [1 ]
机构
[1] Jilin Univ, Minist Educ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Zhuhai Coll, Minist Educ, Zhuhai Lab Key Lab Symbol Computat & Knowledge En, Zhuhai 519041, Peoples R China
基金
中国国家自然科学基金; 中央高校基本科研业务费专项资金资助;
关键词
Deep learning; lncRNA-disease association prediction; Similarity measure; Biological network science; LONG NONCODING RNAS; DATABASE; MICRORNA; CERNA; MODEL; V2.0;
D O I
10.1016/j.mbs.2019.108229
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A kind of noncoding RNA with length more than 200 nucleotides named long noncoding RNA (lncRNA) has gained considerable attention in recent decades. Many studies have confirmed that human genome contains many thousands of lncRNAs. LncRNAs play significant roles in many important biological processes, including complex disease diagnosis, prognosis, prevention and treatment. For some important diseases such as cancer, lncRNAs have been novel candidate biomarkers. However, the role of lncRNAs in human diseases is still in its infancy, and only a small part of lncRNA-disease associations have been experimentally verified. Predicting lncRNA-disease association is an important way to understand the mechanism and function of lncRNA involved in diseases to enrich the annotations of lncRNA. Therefore, it is urgent to prioritize lncRNAs potentially associated with diseases. Biological system is a highly complex heterogenous network involved different molecules. Therefore, the algorithms based on network methods have been extensively applied in information fields which can provide a quantifiable characterization for the networks characterizing multifarious biological systems. A heterogeneous network topology possessing abundant interactions between biomedical entities is rarely utilized in similarity-based methods for predicting lncRNA-disease associations based on the array of varying features of lncRNAs and diseases. DeepWalk, encoding the relations of nodes in a continuous vector space, is an extension of language model and unsupervised learning from sequence-based word to network. In this article, we present a novel lncRNA-disease association prediction method based on DeepWalk, which enhances the existing association discovery methods through a topology-based similarity measure. We integrate the heterogeneous data to construct a Linked Tripartite Network which is a heterogeneous network containing three types od nodes which generated from bioinformatics linked datasets and use DeepWalk method to extract topological structure features of the nodes in the linked tripartite network for calculating similarities. Our proposed method can be separated into the following steps: Firstly, we integrate heterogeneous data to construct a Linked Tripartite Network: containing the topological interactions of known lncRNA-disease, lncRNA-microRNA and microRNA-disease. Secondly, the topological structure features of the nodes are extracted based on DeepWalk. Thirdly, similarity scores of disease-disease pairs and lncRNA-lncRNA pairs are computed based on the topology of this network. Finally, new lncRNA and disease associations are discovered by rule-based inference method with lncRNA-lncRNA similarities. Our proposed method shows superior predictive performance for prediction of lncRNA-disease associations based on topological similarity from heterogenous network. The AUC value is used to show the performance of our method. The similarity measurement using network topology based on DeepWalk provide a novel perspective which is different from the similarity derived from sequence or structure information.
引用
收藏
页数:9
相关论文
共 43 条
  • [1] ncPred: ncRNA-disease association prediction through tripartite network-based inference
    [J]. Giugno, Rosalba (giugno@dmi.unict.it), 1600, Frontiers Media S.A., c/o Michael Kenyon, ch. de la Pecholettaz 6, Epalinges, 1066, Switzerland (02):
  • [2] [Anonymous], SUBJECT SECTION LDAP
  • [3] [Anonymous], 2008, ADV NEURAL INFORM PR
  • [4] [Anonymous], LNCDISEASE A SEQUENC
  • [5] LncRNADisease: a database for long-non-coding RNA-associated diseases
    Chen, Geng
    Wang, Ziyun
    Wang, Dongqing
    Qiu, Chengxiang
    Liu, Mingxi
    Chen, Xing
    Zhang, Qipeng
    Yan, Guiying
    Cui, Qinghua
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D983 - D986
  • [6] Long non-coding RNAs and complex diseases: from experimental results to computational models
    Chen, Xing
    Yan, Chenggang Clarence
    Zhang, Xu
    You, Zhu-Hong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2017, 18 (04) : 558 - 576
  • [7] IRWRLDA: improved random walk with restart for lncRNA-disease association prediction
    Chen, Xing
    You, Zhu-Hong
    Yan, Gui-Ying
    Gong, Dun-Wei
    [J]. ONCOTARGET, 2016, 7 (36) : 57919 - 57931
  • [8] KATZLDA: KATZ measure for the lncRNA-disease association prediction
    Chen, Xing
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [9] Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA
    Chen, Xing
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [10] Novel human lncRNA-disease association inference based on lncRNA expression profiles
    Chen, Xing
    Yan, Gui-Ying
    [J]. BIOINFORMATICS, 2013, 29 (20) : 2617 - 2624