Sequence-based Prediction of Antimicrobial Peptides with CatBoost Classifier

被引:0
|
作者
Yu, Jen-Chieh [1 ]
Ni, Kuan [1 ]
Chen, Ching-Tai [2 ]
机构
[1] Asia Univ, Dept Bioinformat & Med Engn, Taichung, Taiwan
[2] Asia Univ, Dept Bioinformat & Med Engn, Ctr Precis Hlth Res, Taichung, Taiwan
关键词
antimicrobial peptide prediction; therapentic peptide; disease; machine learning; bioinformatics; AMINO-ACID-COMPOSITION; FEATURE-SELECTION; PROTEIN; ANTIBACTERIAL;
D O I
10.1109/BIBE55377.2022.00053
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Antimicrobial resistance is one of the most serious issue for human health. Compared to existing antibiotics, antimicrobial peptides have the advantage of efficient killing microbes and other pathogens without inducing drug resistance. Large-scale experimental methods to characterize AMPs require wet-lab resources and longer time. In silico prediction of AMP, on the other hand, is an attractive strategy to lower the cost and time in the discovery of new AMPs. In this study, we proposed a CatBoost model for AMP prediction. We included various features for numerical representation of peptides, and then employed a systematic approach to select 130 important features for our machine learning models. The CatBoost model achieves an accuracy, F1-score, MCC, and AUC of 0.758, 0.750, 0.518, and 0.831, respectively, for cross validation. For an independent test based on 188 peptide sequences, the proposed model achieves an accuracy, MCC, and AUC of 0.814, 0.632, and 0.884, respectively, all of which are the best compared to five state-of-art methods. Our model improves the MCC of five existing methods by 2.6% to 21.1%, and improves the AUC of them by 1.3% to 13.3%, respectively. The results demonstrate that our CatBoost model is capable of yielding reliable results, and can be of great help in discovering novel AMPs.
引用
收藏
页码:217 / 220
页数:4
相关论文
共 50 条
  • [21] Sequence-based feature prediction and annotation of proteins
    Agnieszka S Juncker
    Lars J Jensen
    Andrea Pierleoni
    Andreas Bernsel
    Michael L Tress
    Peer Bork
    Gunnar von Heijne
    Alfonso Valencia
    Christos A Ouzounis
    Rita Casadio
    Søren Brunak
    Genome Biology, 10
  • [22] Accurate sequence-based prediction of catalytic residues
    Zhang, Tuo
    Zhang, Hua
    Chen, Ke
    Shen, Shiyi
    Ruan, Jishou
    Kurgan, Lukasz
    BIOINFORMATICS, 2008, 24 (20) : 2329 - 2338
  • [23] A sequence-based computational method for prediction of MoRFs
    Wang, Yu
    Guo, Yanzhi
    Pu, Xuemei
    Li, Menglong
    RSC ADVANCES, 2017, 7 (31) : 18937 - 18945
  • [24] Sequence-based prediction in conceptual design of bridges
    Wang, Weiyuan
    Gero, John S.
    Journal of Computing in Civil Engineering, 1997, 2 (01): : 37 - 43
  • [25] Sequence-Based Prediction of Fuzzy Protein Interactions
    Miskei, Marton
    Horvath, Attila
    Vendruscolo, Michele
    Fuxreiter, Monika
    JOURNAL OF MOLECULAR BIOLOGY, 2020, 432 (07) : 2289 - 2303
  • [26] Sequence-Based Prediction of Metamorphic Behavior in Proteins
    Chen, Nanhao
    Das, Madhurima
    LiWang, Andy
    Wang, Lee-Ping
    BIOPHYSICAL JOURNAL, 2020, 119 (07) : 1380 - 1390
  • [27] Sequence-based Structured Prediction for Semantic Parsing
    Xiao, Chunyang
    Dymetman, Marc
    Gardent, Claire
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1341 - 1350
  • [28] Sequence-Based Prediction of Olfactory Receptor Responses
    Chepurwar, Shashank
    Gupta, Abhishek
    Haddad, Rafi
    Gupta, Nitin
    CHEMICAL SENSES, 2019, 44 (09) : 693 - 703
  • [29] Sequence-based prediction in conceptual design of bridges
    Wang, WY
    Gero, JS
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 1997, 11 (01) : 37 - 43
  • [30] Sequence-based feature prediction and annotation of proteins
    Juncker, Agnieszka S.
    Jensen, Lars J.
    Pierleoni, Andrea
    Bernsel, Andreas
    Tress, Michael L.
    Bork, Peer
    von Heijne, Gunnar
    Valencia, Alfonso
    Ouzounis, Christos A.
    Casadio, Rita
    Brunak, Soren
    GENOME BIOLOGY, 2009, 10 (02): : 206