An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

被引:24
作者
Nanni, Loris [1 ]
Lumini, Alessandra [1 ]
机构
[1] Univ Bologna, DEIS, CNR, IEIIT, I-40136 Bologna, Italy
关键词
Multi-classifier; Amino-acid alphabets; Support vector machine; DNA-binding proteins; Ensemble classifier; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; SUBCELLULAR LOCATION PREDICTION; STRUCTURAL CLASS PREDICTION; COMPLEXITY MEASURE FACTOR; ENZYME SUBFAMILY CLASSES; IMPROVED HYBRID APPROACH; WEB-SERVER; CELLULAR-AUTOMATA; SUBNUCLEAR LOCALIZATION;
D O I
10.1007/s00726-008-0044-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is a parts per thousand 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 50 条
[21]   EmbedCaps-DBP: Predicting DNA-Binding Proteins Using Protein Sequence Embedding and Capsule Network [J].
Naim, Muhammad Khaerul ;
Mengko, Tati Rajab ;
Hertadi, Rukman ;
Purwarianti, Ayu ;
Susanty, Meredita .
IEEE ACCESS, 2023, 11 :121256-121268
[22]   RF-SVM: Identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine [J].
Zhang, Yanping ;
Ni, Jianwei ;
Gao, Ya .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2022, 90 (02) :395-404
[23]   DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning [J].
Barukab, Omar ;
Ali, Farman ;
Khan, Sher Afzal .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (04)
[24]   TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning [J].
Hu, Jun ;
Zhou, Xiao-Gen ;
Zhu, Yi-Heng ;
Yu, Dong-Jun ;
Zhang, Gui-Jun .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (04) :1419-1429
[25]   nDNA-prot: identification of DNA-binding proteins based on unbalanced classification [J].
Song, Li ;
Li, Dapeng ;
Zeng, Xiangxiang ;
Wu, Yunfeng ;
Guo, Li ;
Zou, Quan .
BMC BIOINFORMATICS, 2014, 15
[26]   Predicting DNA-binding Proteins Using Feature Fusion and MSVM-RFE [J].
Ji, Guoli ;
Lin, Yang ;
Lin, Qiamnin ;
Huang, Guangzao ;
Zhu, Wenbing ;
You, Wenjie .
PROCEEDINGS OF 2016 10TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2016, :109-112
[27]   StackDPPred: a stacking based prediction of DNA-binding protein from sequence [J].
Mishra, Avdesh ;
Pokhrel, Pujan ;
Hoque, Md Tamjidul .
BIOINFORMATICS, 2019, 35 (03) :433-441
[28]   Predicting DNA-Binding Proteins and Binding Residues by Complex Structure Prediction and Application to Human Proteome [J].
Zhao, Huiying ;
Wang, Jihua ;
Zhou, Yaoqi ;
Yang, Yuedong .
PLOS ONE, 2014, 9 (05)
[29]   Predicting DNA-binding locations and orientation on proteins using knowledge-based learning of geometric properties [J].
Wang, Chien-Chih ;
Chen, Chien-Yu .
PROTEOME SCIENCE, 2011, 9
[30]   DBPboost:A method of classification of DNA-binding proteins based on improved differential evolution algorithm and feature extraction [J].
Sun, Ailun ;
Li, Hongfei ;
Dong, Guanghui ;
Zhao, Yuming ;
Zhang, Dandan .
METHODS, 2024, 223 :56-64