Sequence-based peptide identification, generation, and property prediction with deep learning: a review

被引:32
作者
Chen, Xumin [1 ]
Li, Chen [1 ]
Bernards, Matthew T. [2 ]
Shi, Yao [1 ,3 ]
Shao, Qing [4 ]
He, Yi [1 ,5 ]
机构
[1] Zhejiang Univ, Coll Chem & Biol Engn, Hangzhou 310027, Peoples R China
[2] Univ Idaho, Dept Chem & Biol Engn, Moscow, ID 83844 USA
[3] Zhejiang Univ, Minist Educ, Key Lab Biomass Chem Engn, Hangzhou, Peoples R China
[4] Univ Kentucky, Chem & Mat Engn Dept, Lexington, KY 40506 USA
[5] Univ Washington, Dept Chem Engn, Seattle, WA 98195 USA
基金
中国国家自然科学基金;
关键词
NEURAL-NETWORK MODEL; CONDITIONAL VARIATIONAL AUTOENCODER; IDENTIFYING ANTIMICROBIAL PEPTIDES; DATABASE; DESIGN; CLASSIFICATION; PROTEINS; SITES; LSTM; TOOL;
D O I
10.1039/d0me00161a
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Over the past few years, deep learning has demonstrated itself to be a powerful tool in many areas, especially bioinformatics. With its previous success in DNA and protein related studies, deep learning has now been brought to the field of peptide science as well. It has been widely used in sequence-based peptide identification, generation, and property prediction. The publications on this subject over the past two years are summarized in this review. The deep learning models reported are mainly convolutional neural networks, recurrent neural networks, hybrid models, transformers, and other generative models like variational autoencoders and generative adversarial networks, as well as algorithms like input optimization. Application areas include antimicrobial peptides, signal peptides, and major histocompatibility complex binding peptides, among others. This review develops content according to the general workflow of deep learning, while illustrating adaptations and techniques specific to certain example problems. Some issues and future directions are also discussed, such as approaches for model interpretation, benchmark datasets, automation in deep learning, and rational peptide design techniques.
引用
收藏
页码:406 / 428
页数:23
相关论文
共 197 条
[1]  
Abdar M., 2020, ARXIV201106225
[2]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[3]   Unified rational protein engineering with sequence-based deep representation learning [J].
Alley, Ethan C. ;
Khimulya, Grigory ;
Biswas, Surojit ;
AlQuraishi, Mohammed ;
Church, George M. .
NATURE METHODS, 2019, 16 (12) :1315-+
[4]  
Anand N., 2018, P 32 INT C NEUR INF
[5]   An automated benchmarking platform for MHC class II binding prediction methods [J].
Andreatta, Massimo ;
Trolle, Thomas ;
Yan, Zhen ;
Greenbaum, Jason A. ;
Peters, Bjoern ;
Nielsen, Morten .
BIOINFORMATICS, 2018, 34 (09) :1522-1528
[6]   DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning [J].
Angermueller, Christof ;
Lee, Heather J. ;
Reik, Wolf ;
Stegle, Oliver .
GENOME BIOLOGY, 2017, 18
[7]  
[Anonymous], ARXIV190501392
[8]  
[Anonymous], 2017, Deep recurrent neural network for protein function prediction from sequence
[9]  
[Anonymous], 2002, P 40 ANN M ASS COMP
[10]  
[Anonymous], 2009, CVPR