A clustering-based sampling method for miRNA-disease association prediction

被引:2
作者
Wei, Zheng [1 ]
Yao, Dengju [1 ]
Zhan, Xiaojuan [1 ,2 ]
Zhang, Shuli [1 ]
机构
[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Heilongjiang Inst Technol, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
miRNA-disease association; ensemble learning; clustering; sampling; computational methods; MICRORNAS;
D O I
10.3389/fgene.2022.995535
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.
引用
收藏
页数:12
相关论文
共 51 条
  • [1] Bandyopadhyay Sanghamitra, 2010, Silence, V1, P6, DOI 10.1186/1758-907X-1-6
  • [2] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [3] Ensemble of decision tree reveals potential miRNA-disease associations
    Chen, Xing
    Zhu, Chi-Chi
    Yin, Jun
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (07)
  • [4] MicroRNAs and complex diseases: from experimental results to computational models
    Chen, Xing
    Xie, Di
    Zhao, Qi
    You, Zhu-Hong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (02) : 515 - 539
  • [5] EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association prediction
    Chen, Xing
    Huang, Li
    Xie, Di
    Zhao, Qi
    [J]. CELL DEATH & DISEASE, 2018, 9
  • [6] WBSMDA: Within and Between Score for MiRNA-Disease Association prediction
    Chen, Xing
    Yan, Chenggang Clarence
    Zhang, Xu
    You, Zhu-Hong
    Deng, Lixi
    Liu, Ying
    Zhang, Yongdong
    Dai, Qionghai
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [7] Novel human lncRNA-disease association inference based on lncRNA expression profiles
    Chen, Xing
    Yan, Gui-Ying
    [J]. BIOINFORMATICS, 2013, 29 (20) : 2617 - 2624
  • [8] ANMDA: anti-noise based computational model for predicting potential miRNA-disease associations
    Chen, Xue-Jun
    Hua, Xin-Yun
    Jiang, Zhen-Ran
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [9] Predicting miRNA-disease associations using an ensemble learning framework with resampling method
    Dai, Qiguo
    Wang, Zhaowei
    Liu, Ziqiang
    Duan, Xiaodong
    Song, Jinmiao
    Guo, Maozu
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [10] A Stacked Ensemble Learning Framework with Heterogeneous Feature Combinations for Predicting ncRNA-Protein Interaction
    Dai, Qiguo
    Wang, Zhaowei
    Song, Jinmiao
    Duan, Xiaodong
    Guo, Maozu
    Tian, Zhen
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 67 - 71