Research on Plant RNA-Binding Protein Prediction Method Based on Improved Ensemble Learning

被引:0
作者
Zhang, Hongwei [1 ]
Shi, Yan [2 ]
Wang, Yapeng [1 ]
Yang, Xu [1 ]
Li, Kefeng [3 ]
Im, Sio-Kei [1 ]
Han, Yu [4 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China
[2] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[3] Macao Polytech Univ, Fac Appl Sci, Ctr Artificial Intelligence Driven Drug Discovery, Macau 999078, Peoples R China
[4] Southwest Forestry Univ, Fac Civil Engn, Kunming 650224, Peoples R China
来源
BIOLOGY-BASEL | 2025年 / 14卷 / 06期
关键词
plant; RNA-binding proteins; RBPs; TextCNN; ensemble learning; SITES; INTERACT; DNA;
D O I
10.3390/biology14060672
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
(1) RNA-binding proteins (RBPs) play a crucial role in regulating gene expression in plants, affecting growth, development, and stress responses. Accurate prediction of plant-specific RBPs is vital for understanding gene regulation and enhancing genetic improvement. (2) Methods: We propose an ensemble learning method that integrates shallow and deep learning. It integrates prediction results from SVM, LR, LDA, and LightGBM into an enhanced TextCNN, using K-Peptide Composition (KPC) encoding (k = 1, 2) to form a 420-dimensional feature vector, extended to 424 dimensions by including those four prediction outputs. Redundancy is minimized using a Pearson correlation threshold of 0.80. (3) Results: On the benchmark dataset of 4992 sequences, our method achieved an ACC of 97.20% and 97.06% under 5-fold and 10-fold cross-validation, respectively. On an independent dataset of 1086 sequences, our method attained an ACC of 99.72%, an F1score of 99.72%, an MCC of 99.45%, an SN of 99.63%, and an SP of 99.82%, outperforming RBPLight by 12.98 percentage points in ACC and the original TextCNN by 25.23 percentage points. (4) Conclusions: These results highlight our method's superior accuracy and efficiency over PSSM-based approaches, enabling large-scale plant RBP prediction.
引用
收藏
页数:23
相关论文
共 58 条
[1]  
Abuelmakarem HS, 2025, International Journal of Industry and Sustainable Development, V6, P67, DOI [10.21608/ijisd.2025.399181, 10.21608/ijisd.2025.399181, DOI 10.21608/IJISD.2025.399181]
[2]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[3]   Exploring new roles for RNA-binding proteins in epigenetic and gene regulation [J].
Avila-Lopez, Pedro ;
Lauberth, Shannon M. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2024, 84
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]   iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Leier, Andre ;
Marquez-Lago, Tatiana T. ;
Wang, Yanan ;
Webb, Geoffrey I. ;
Smith, A. Ian ;
Daly, Roger J. ;
Chou, Kuo-Chen ;
Song, Jiangning .
BIOINFORMATICS, 2018, 34 (14) :2499-2502
[6]   Feature selection may improve deep neural networks for the bioinformatics problems [J].
Chen, Zheng ;
Pang, Meng ;
Zhao, Zixin ;
Li, Shuainan ;
Miao, Rui ;
Zhang, Yifan ;
Feng, Xiaoyue ;
Feng, Xin ;
Zhang, Yexian ;
Duan, Meiyu ;
Huang, Lan ;
Zhou, Fengfeng .
BIOINFORMATICS, 2020, 36 (05) :1542-1552
[7]   How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms [J].
Corley, Meredith ;
Burns, Margaret C. ;
Yeo, Gene W. .
MOLECULAR CELL, 2020, 78 (01) :9-29
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure [J].
Deng, Lei ;
Liu, Youzhi ;
Shi, Yechuan ;
Zhang, Wenhao ;
Yang, Chun ;
Liu, Hui .
BMC GENOMICS, 2020, 21 (Suppl 13)
[10]   Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data [J].
Fan, Junliang ;
Ma, Xin ;
Wu, Lifeng ;
Zhang, Fucang ;
Yu, Xiang ;
Zeng, Wenzhi .
AGRICULTURAL WATER MANAGEMENT, 2019, 225