DBPboost:A method of classification of DNA-binding proteins based on improved differential evolution algorithm and feature extraction

被引:5
作者
Sun, Ailun [1 ]
Li, Hongfei [2 ]
Dong, Guanghui [1 ]
Zhao, Yuming [1 ]
Zhang, Dandan [3 ]
机构
[1] Northeast Forestry Univ, Coll Comp & Control Engn, Harbin 150040, Peoples R China
[2] Northeast Forestry Univ, Coll Life Sci, Harbin 150040, Peoples R China
[3] Harbin Med Univ, Affiliated Hosp 1, Dept Obstet & Gynecol, Harbin, Heilongjiang, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Protein structure prediction; Differential evolution; Dipeptide position-specific scoring matrix; Extreme gradient boosting decision tree; Random forest; Support vector machine; WEB SERVER; PREDICTION; IDENTIFICATION; PERFORMANCE; PSEAAC; DPP;
D O I
10.1016/j.ymeth.2024.01.005
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA-binding proteins are a class of proteins that can interact with DNA molecules through physical and chemical interactions. Their main functions include regulating gene expression, maintaining chromosome structure and stability, and more. DNA-binding proteins play a crucial role in cellular and molecular biology, as they are essential for maintaining normal cellular physiological functions and adapting to environmental changes. The prediction of DNA-binding proteins has been a hot topic in the field of bioinformatics. The key to accurately classifying DNA-binding proteins is to find suitable feature sources and explore the information they contain. Although there are already many models for predicting DNA-binding proteins, there is still room for improvement in mining feature source information and calculation methods. In this study, we created a model called DBPboost to better identify DNA-binding proteins. The innovation of this study lies in the use of eight feature extraction methods, the improvement of the feature selection step, which involves selecting some features first and then performing feature selection again after feature fusion, and the optimization of the differential evolution algorithm in feature fusion, which improves the performance of feature fusion. The experimental results show that the prediction accuracy of the model on the UniSwiss dataset is 89.32%, and the sensitivity is 89.01%, which is better than most existing models.
引用
收藏
页码:56 / 64
页数:9
相关论文
共 53 条
[1]   EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation [J].
Amidi, Afshine ;
Amidi, Shervine ;
Vlachakis, Dimitrios ;
Megalooikonomou, Vasileios ;
Paragios, Nikos ;
Zacharaki, Evangelia, I .
PEERJ, 2018, 6
[2]   Cross-validation methods [J].
Browne, MW .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) :108-132
[3]   Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions [J].
Chang, Yung-Chia ;
Chang, Kuei-Hu ;
Wu, Guan-Jhih .
APPLIED SOFT COMPUTING, 2018, 73 :914-920
[4]  
Chen Guangfeng, 2021, Journal of Shanghai Jiao Tong University, V55, P1291, DOI 10.16183/j.cnki.jsjtu.2019.176
[5]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[6]   iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Marquez-Lago, Tatiana T. ;
Leier, Andre ;
Revote, Jerico ;
Zhu, Yan ;
Powell, David R. ;
Akutsu, Tatsuya ;
Webb, Geoffrey, I ;
Chou, Kuo-Chen ;
Smith, A. Ian ;
Daly, Roger J. ;
Li, Jian ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) :1047-1057
[7]   iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences [J].
Chen, Zhen ;
Zhao, Pei ;
Li, Fuyi ;
Leier, Andre ;
Marquez-Lago, Tatiana T. ;
Wang, Yanan ;
Webb, Geoffrey I. ;
Smith, A. Ian ;
Daly, Roger J. ;
Chou, Kuo-Chen ;
Song, Jiangning .
BIOINFORMATICS, 2018, 34 (14) :2499-2502
[8]   MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 360 (02) :339-345
[9]   Adaptive kernel density estimation with generalized least square cross-validation [J].
Demir, Serdar .
HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2019, 48 (02) :616-625
[10]   MsDBP: Exploring DNA-Binding Proteins by Integrating Multiscale Sequence Information via Chou's Five-Step Rule [J].
Du, Xiuquan ;
Diao, Yanyu ;
Liu, Heng ;
Li, Shuo .
JOURNAL OF PROTEOME RESEARCH, 2019, 18 (08) :3119-3132