An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

被引:24
作者
Nanni, Loris [1 ]
Lumini, Alessandra [1 ]
机构
[1] Univ Bologna, DEIS, CNR, IEIIT, I-40136 Bologna, Italy
关键词
Multi-classifier; Amino-acid alphabets; Support vector machine; DNA-binding proteins; Ensemble classifier; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; SUBCELLULAR LOCATION PREDICTION; STRUCTURAL CLASS PREDICTION; COMPLEXITY MEASURE FACTOR; ENZYME SUBFAMILY CLASSES; IMPROVED HYBRID APPROACH; WEB-SERVER; CELLULAR-AUTOMATA; SUBNUCLEAR LOCALIZATION;
D O I
10.1007/s00726-008-0044-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is a parts per thousand 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 50 条
[41]   A Web-based classification system of DNA-binding protein families [J].
Karmirantzou, M ;
Hamodrakas, SJ .
PROTEIN ENGINEERING, 2001, 14 (07) :465-472
[42]   Predicting the Sequence Specificities of DNA-Binding Proteins by DNA Fine-Tuned Language Model With Decaying Learning Rates [J].
He, Ying ;
Zhang, Qinhu ;
Wang, Siguo ;
Chen, Zhanheng ;
Cui, Zhen ;
Guo, Zhen-Hao ;
Huang, De-Shuang .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (01) :616-624
[43]   An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis [J].
Chuanxin Zou ;
Jiayu Gong ;
Honglin Li .
BMC Bioinformatics, 14
[44]   Set of approaches based on 3D structure and position specific-scoring matrix for predicting DNA-binding proteins [J].
Nanni, Loris ;
Brahnam, Sheryl .
BIOINFORMATICS, 2019, 35 (11) :1844-1851
[45]   Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information [J].
Lingsong Yao ;
Huadong Wang ;
Yannan Bin .
Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 :1-11
[46]   Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information [J].
Ma, Xin ;
Guo, Jing ;
Liu, Hong-De ;
Xie, Jian-Ming ;
Sun, Xiao .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (06) :1766-1775
[47]   Identifying DNA-binding proteins based on multi-features and LASSO feature selection [J].
Zhang, Shengli ;
Zhu, Fu ;
Yu, Qianhao ;
Zhu, Xiaoyue .
BIOPOLYMERS, 2021, 112 (02)
[48]   Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine [J].
Qian, Yuqing ;
Meng, Hao ;
Lu, Weizhong ;
Liao, Zhijun ;
Ding, Yijie ;
Wu, Hongjie .
CURRENT BIOINFORMATICS, 2022, 17 (01) :108-117
[49]   MV-H-RKM: A Multiple View-Based Hypergraph Regularized Restricted Kernel Machine for Predicting DNA-Binding Proteins [J].
Guan, Shixuan ;
Qian, Yuqing ;
Jiang, Tengsheng ;
Jiang, Min ;
Ding, Yijie ;
Wu, Hongjie .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) :1246-1256
[50]   Inadequacy of Evolutionary Profiles Vis-a-vis Single Sequences in Predicting Transient DNA-Binding Sites in Proteins [J].
Arya, Ajay ;
Varghese, Dana Mary ;
Verma, Ajay Kumar ;
Ahmad, Shandar .
JOURNAL OF MOLECULAR BIOLOGY, 2022, 434 (13)