EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features

被引:4
作者
Zhuang, Jujuan [1 ]
Gao, Wanquan [1 ]
Su, Rui [1 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian, Liaoning, Peoples R China
关键词
Antimicrobial peptides prediction; word embedding; deep learning; machine learning; ensemble learning; ANTIMICROBIAL PEPTIDES; PREDICTION;
D O I
10.1142/S021972002450001X
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Antimicrobial peptides (AMPs), as the preferred alternatives to antibiotics, have wide application with good prospects. Identifying AMPs through wet lab experiments remains expensive, time-consuming and challenging. Many machine learning methods have been proposed to predict AMPs and achieved good results. In this work, we combine two kinds of word embedding features with the statistical features of peptide sequences to develop an ensemble classifier, named EnAMP, in which, two deep neural networks are trained based on Word2vec and Glove word embedding features of peptide sequences, respectively, meanwhile, we utilize statistical features of peptide sequences to train random forest and support vector machine classifiers. The average of four classifiers is the final prediction result. Compared with other state-of-the-art algorithms on six datasets, EnAMP outperforms most existing models with similar computational costs, even when compared with high computational cost algorithms based on Bidirectional Encoder Representation from Transformers (BERT), the performance of our model is comparable. EnAMP source code and the data are available at https://github.com/ruisue/EnAMP.
引用
收藏
页数:16
相关论文
共 43 条
[1]   UniProt: a worldwide hub of protein knowledge [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Alpi, Emanuele ;
Bely, Benoit ;
Bingley, Mark ;
Britto, Ramona ;
Bursteinas, Borisas ;
Busiello, Gianluca ;
Bye-A-Jee, Hema ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Daniel ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Ignatchenko, Alexandr ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Nightingale, Andrew ;
Onwubiko, Joseph ;
Palka, Barbara ;
Pichler, Klemens ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Renaux, Alexandre ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Volynkin, Vladimir ;
Wardell, Tony .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D506-D515
[2]   AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest [J].
Bhadra, Pratiti ;
Yan, Jielu ;
Li, Jinyan ;
Fong, Simon ;
Siu, Shirley W. I. .
SCIENTIFIC REPORTS, 2018, 8
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Analysis and Prediction of Highly Effective Antiviral Peptides Based on Random Forests [J].
Chang, Kuan Y. ;
Yang, Je-Ruei .
PLOS ONE, 2013, 8 (08)
[5]   DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks [J].
Chen, Chen ;
Hou, Jie ;
Shi, Xiaowen ;
Yang, Hua ;
Birchler, James A. ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2021, 22 (01)
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]   Embeddings of genomic region sets capture rich biological associations in lower dimensions [J].
Gharavi, Erfaneh ;
Gu, Aaron ;
Zheng, Guangtao ;
Smith, Jason P. ;
Cho, Hyun Jae ;
Zhang, Aidong ;
Brown, Donald E. ;
Sheffield, Nathan C. .
BIOINFORMATICS, 2021, 37 (23) :4299-4306
[8]   AMAP: Hierarchical multi-label prediction of biologically active and antimicrobial peptides [J].
Gull, Sadaf ;
Shamim, Nauman ;
Minhas, Fayyaz .
COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 107 :172-181
[9]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[10]   CD-HIT Suite: a web server for clustering and comparing biological sequences [J].
Huang, Ying ;
Niu, Beifang ;
Gao, Ying ;
Fu, Limin ;
Li, Weizhong .
BIOINFORMATICS, 2010, 26 (05) :680-682