DEEPre: sequence-based enzyme EC number prediction by deep learning

被引:193
作者
Li, Yu [1 ]
Wang, Sheng [1 ]
Umarov, Ramzan [1 ]
Xie, Bingqing [2 ]
Fan, Ming [3 ]
Li, Lihua [3 ]
Gao, Xin [1 ]
机构
[1] KAUST, CBRC, Elect & Math Sci & Engn Div CEMSE, Thuwal 239556900, Saudi Arabia
[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA
[3] Hangzhou Dianzi Univ, Inst Biomed Engn & Instrumentat, Hangzhou 310018, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINE; HIERARCHICAL-CLASSIFICATION; ACCURATE PREDICTION; PROTEIN-STRUCTURE; SUBFAMILY CLASS; FAMILY CLASSES; DATABASE; ANNOTATION; GENERATION;
D O I
10.1093/bioinformatics/btx680
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms. The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
引用
收藏
页码:760 / 769
页数:10
相关论文
共 79 条
[1]  
Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/3022670.2976746, 10.1145/2951913.2976746]
[2]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
[Anonymous], NUCL ACIDS RES
[5]  
[Anonymous], COURSERA
[6]   EFICAz2: enzyme function inference by a combined approach enhanced by machine learning [J].
Arakaki, Adrian K. ;
Huang, Ying ;
Skolnick, Jeffrey .
BMC BIOINFORMATICS, 2009, 10
[7]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[8]   Enzyme family classification by support vector machines [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, YZ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) :66-76
[9]   SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697
[10]   Predicting enzyme subclass by functional domain composition and pseudo amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) :967-971