An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

被引:24
作者
Nanni, Loris [1 ]
Lumini, Alessandra [1 ]
机构
[1] Univ Bologna, DEIS, CNR, IEIIT, I-40136 Bologna, Italy
关键词
Multi-classifier; Amino-acid alphabets; Support vector machine; DNA-binding proteins; Ensemble classifier; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; SUBCELLULAR LOCATION PREDICTION; STRUCTURAL CLASS PREDICTION; COMPLEXITY MEASURE FACTOR; ENZYME SUBFAMILY CLASSES; IMPROVED HYBRID APPROACH; WEB-SERVER; CELLULAR-AUTOMATA; SUBNUCLEAR LOCALIZATION;
D O I
10.1007/s00726-008-0044-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is a parts per thousand 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 50 条
  • [21] EmbedCaps-DBP: Predicting DNA-Binding Proteins Using Protein Sequence Embedding and Capsule Network
    Naim, Muhammad Khaerul
    Mengko, Tati Rajab
    Hertadi, Rukman
    Purwarianti, Ayu
    Susanty, Meredita
    IEEE ACCESS, 2023, 11 : 121256 - 121268
  • [22] RF-SVM: Identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine
    Zhang, Yanping
    Ni, Jianwei
    Gao, Ya
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2022, 90 (02) : 395 - 404
  • [23] DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning
    Barukab, Omar
    Ali, Farman
    Khan, Sher Afzal
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (04)
  • [24] TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning
    Hu, Jun
    Zhou, Xiao-Gen
    Zhu, Yi-Heng
    Yu, Dong-Jun
    Zhang, Gui-Jun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (04) : 1419 - 1429
  • [25] Predicting DNA-binding Proteins Using Feature Fusion and MSVM-RFE
    Ji, Guoli
    Lin, Yang
    Lin, Qiamnin
    Huang, Guangzao
    Zhu, Wenbing
    You, Wenjie
    PROCEEDINGS OF 2016 10TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2016, : 109 - 112
  • [26] Predicting DNA-Binding Proteins and Binding Residues by Complex Structure Prediction and Application to Human Proteome
    Zhao, Huiying
    Wang, Jihua
    Zhou, Yaoqi
    Yang, Yuedong
    PLOS ONE, 2014, 9 (05):
  • [27] StackDPPred: a stacking based prediction of DNA-binding protein from sequence
    Mishra, Avdesh
    Pokhrel, Pujan
    Hoque, Md Tamjidul
    BIOINFORMATICS, 2019, 35 (03) : 433 - 441
  • [28] nDNA-prot: identification of DNA-binding proteins based on unbalanced classification
    Song, Li
    Li, Dapeng
    Zeng, Xiangxiang
    Wu, Yunfeng
    Guo, Li
    Zou, Quan
    BMC BIOINFORMATICS, 2014, 15
  • [29] Predicting DNA-binding locations and orientation on proteins using knowledge-based learning of geometric properties
    Wang, Chien-Chih
    Chen, Chien-Yu
    PROTEOME SCIENCE, 2011, 9
  • [30] Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features
    Fang, Y.
    Guo, Y.
    Feng, Y.
    Li, M.
    AMINO ACIDS, 2008, 34 (01) : 103 - 109