Inferring Potential CircRNA-Disease Associations via Deep Autoencoder-Based Classification

被引:43
作者
Deepthi, K. [1 ,2 ]
Jereesh, A. S. [1 ]
机构
[1] Cochin Univ Sci & Technol, Bioinformat Lab, Dept Comp Sci, Kochi 682022, Kerala, India
[2] Vadakara CAPE, Coll Engn, Dept Comp Sci, Kozhikkode 673104, Kerala, India
关键词
CIRCULAR RNAS; ROC CURVE; PREDICTION; ONTOLOGY;
D O I
10.1007/s40291-020-00499-y
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Aim Circular RNAs (circRNA) are endogenous non-coding RNA molecules with a stable circular conformation. Growing evidence from recent experiments reveals that dysregulations and abnormal expressions of circRNAs are correlated with complex diseases. Therefore, identifying the causal circRNAs behind diseases is invaluable in explaining the disease pathogenesis. Since biological experiments are difficult, slow-progressing, and prohibitively expensive, computational approaches are necessary for identifying the relationships between circRNAs and diseases. Methods We propose an ensemble method called AE-RF, based on a deep autoencoder and random forest classifier, to predict potential circRNA-disease associations. The method first integrates circRNA and disease similarities to construct features. The integrated features are sent to the deep autoencoder, to extract hidden biological patterns. With the extracted deep features, the random forest classifier is trained for association prediction. Results and discussion AE-RF achieved AUC scores of 0.9486 and 0.9522, in fivefold and tenfold cross-validation experiments, respectively. We conducted case studies on the top-most predicted results and three common human cancers. We compared the method with state-of-the-art classifiers and related methods. The experimental results and case studies demonstrate the prediction power of the model, and it outperforms previous methods with high degree of robustness. Training the classifier with the unique features retrieved by the autoencoder enhanced the model's predictive performance. The top predicted circRNAs are promising candidates for further biological tests.
引用
收藏
页码:87 / 97
页数:11
相关论文
共 60 条
  • [1] LncRNA-ID: Long non-coding RNA IDentification using balanced random forests
    Achawanantakun, Rujira
    Chen, Jiao
    Sun, Yanni
    Zhang, Yuan
    [J]. BIOINFORMATICS, 2015, 31 (24) : 3897 - 3905
  • [2] [Anonymous], 2012, CoRR
  • [3] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles
    Chen, Li
    Jin, Peng
    Qin, Zhaohui S.
    [J]. GENOME BIOLOGY, 2016, 17
  • [6] Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model
    Chen, Lujia
    Cai, Chunhui
    Chen, Vicky
    Lu, Xinghua
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [7] MicroRNAs and complex diseases: from experimental results to computational models
    Chen, Xing
    Xie, Di
    Zhao, Qi
    You, Zhu-Hong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (02) : 515 - 539
  • [8] Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity
    Chen, Xing
    Yan, Chenggang Clarence
    Luo, Cai
    Ji, Wen
    Zhang, Yongdong
    Dai, Qionghai
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [9] Circular RNAs: a new frontier in the study of human diseases
    Chen, Yonghua
    Li, Cheng
    Tan, Chunlu
    Liu, Xubao
    [J]. JOURNAL OF MEDICAL GENETICS, 2016, 53 (06) : 359 - 365
  • [10] Chicco D, 2014, P 5 ACM C BIOINF COM, DOI 10.1145/2649387.2649442