A clustering-based sampling method for miRNA-disease association prediction

被引:3
作者
Wei, Zheng [1 ]
Yao, Dengju [1 ]
Zhan, Xiaojuan [1 ,2 ]
Zhang, Shuli [1 ]
机构
[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Heilongjiang Inst Technol, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
miRNA-disease association; ensemble learning; clustering; sampling; computational methods; MICRORNAS;
D O I
10.3389/fgene.2022.995535
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.
引用
收藏
页数:12
相关论文
共 51 条
[1]  
Bandyopadhyay Sanghamitra, 2010, Silence, V1, P6, DOI 10.1186/1758-907X-1-6
[2]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[3]   Ensemble of decision tree reveals potential miRNA-disease associations [J].
Chen, Xing ;
Zhu, Chi-Chi ;
Yin, Jun .
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (07)
[4]   MicroRNAs and complex diseases: from experimental results to computational models [J].
Chen, Xing ;
Xie, Di ;
Zhao, Qi ;
You, Zhu-Hong .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (02) :515-539
[5]   EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association prediction [J].
Chen, Xing ;
Huang, Li ;
Xie, Di ;
Zhao, Qi .
CELL DEATH & DISEASE, 2018, 9
[6]   WBSMDA: Within and Between Score for MiRNA-Disease Association prediction [J].
Chen, Xing ;
Yan, Chenggang Clarence ;
Zhang, Xu ;
You, Zhu-Hong ;
Deng, Lixi ;
Liu, Ying ;
Zhang, Yongdong ;
Dai, Qionghai .
SCIENTIFIC REPORTS, 2016, 6
[7]   Novel human lncRNA-disease association inference based on lncRNA expression profiles [J].
Chen, Xing ;
Yan, Gui-Ying .
BIOINFORMATICS, 2013, 29 (20) :2617-2624
[8]   ANMDA: anti-noise based computational model for predicting potential miRNA-disease associations [J].
Chen, Xue-Jun ;
Hua, Xin-Yun ;
Jiang, Zhen-Ran .
BMC BIOINFORMATICS, 2021, 22 (01)
[9]   Predicting miRNA-disease associations using an ensemble learning framework with resampling method [J].
Dai, Qiguo ;
Wang, Zhaowei ;
Liu, Ziqiang ;
Duan, Xiaodong ;
Song, Jinmiao ;
Guo, Maozu .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
[10]   A Stacked Ensemble Learning Framework with Heterogeneous Feature Combinations for Predicting ncRNA-Protein Interaction [J].
Dai, Qiguo ;
Wang, Zhaowei ;
Song, Jinmiao ;
Duan, Xiaodong ;
Guo, Maozu ;
Tian, Zhen .
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, :67-71