An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

被引:24
作者
Nanni, Loris [1 ]
Lumini, Alessandra [1 ]
机构
[1] Univ Bologna, DEIS, CNR, IEIIT, I-40136 Bologna, Italy
关键词
Multi-classifier; Amino-acid alphabets; Support vector machine; DNA-binding proteins; Ensemble classifier; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; SUBCELLULAR LOCATION PREDICTION; STRUCTURAL CLASS PREDICTION; COMPLEXITY MEASURE FACTOR; ENZYME SUBFAMILY CLASSES; IMPROVED HYBRID APPROACH; WEB-SERVER; CELLULAR-AUTOMATA; SUBNUCLEAR LOCALIZATION;
D O I
10.1007/s00726-008-0044-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is a parts per thousand 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 50 条
[31]   Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features [J].
Fang, Y. ;
Guo, Y. ;
Feng, Y. ;
Li, M. .
AMINO ACIDS, 2008, 34 (01) :103-109
[32]   DNA-Binding Proteins Essential for Protein-Primed Bacteriophage Φ29 DNA Replication [J].
Salas, Margarita ;
Holguera, Isabel ;
Redrejo-Rodriguez, Modesto ;
de Vega, Miguel .
FRONTIERS IN MOLECULAR BIOSCIENCES, 2016, 3
[33]   A DNA-binding protein capture technology that purifies proteins by directly isolating the target DNA [J].
Wang, Zhibo ;
He, Zihang ;
Wang, Jingxin ;
Wang, Chao ;
Gao, Caiqiu ;
Wang, Yucheng .
PLANT SCIENCE, 2023, 335
[34]   HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features [J].
Zaman, Rianon ;
Chowdhury, Shahana Yasmin ;
Rashid, Mahmood A. ;
Sharma, Alok ;
Dehzangi, Abdollah ;
Shatabda, Swakkhar .
BIOMED RESEARCH INTERNATIONAL, 2017, 2017
[35]   gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence [J].
Zhang, Yan-ping ;
Wuyunqiqige ;
Zheng, Wei ;
Liu, Shuyi ;
Zhao, Chunguang .
JOURNAL OF THEORETICAL BIOLOGY, 2016, 406 :8-16
[36]   An accurate feature-based method for identifying DNA-binding residues on protein surfaces [J].
Xiong, Yi ;
Liu, Juan ;
Wei, Dong-Qing .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (02) :509-517
[37]   Prediction of DNA-binding protein based on statistical and geometric features and support vector machines [J].
Weiqiang Zhou ;
Hong Yan .
Proteome Science, 9
[38]   Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence [J].
Cai, YD ;
Lin, SL .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2003, 1648 (1-2) :127-133
[39]   Oligonucleotide-based PROTACs to Degrade RNA- and DNA-Binding Proteins [J].
Weller, Celine N. ;
Hall, Jonathan .
CHIMIA, 2025, 79 (03) :167-171
[40]   nDNA-prot: identification of DNA-binding proteins based on unbalanced classification [J].
Li Song ;
Dapeng Li ;
Xiangxiang Zeng ;
Yunfeng Wu ;
Li Guo ;
Quan Zou .
BMC Bioinformatics, 15