Improving Protein Fold Recognition by Deep Learning Networks

被引:92
作者
Jo, Taeho [1 ,2 ]
Hou, Jie [1 ]
Eickholt, Jesse [3 ]
Cheng, Jianlin [1 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
[2] Univ Michigan, Dept Biol Chem, Ann Arbor, MI 48109 USA
[3] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA
关键词
HIDDEN MARKOV-MODELS; PROFILE; INFORMATION; PREDICTION; ALGORITHM; ALIGNMENT; DATABASE;
D O I
10.1038/srep17573
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.
引用
收藏
页数:11
相关论文
共 34 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2013, ARXIV13084214
[3]   The universal protein resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Dobrokhotov, Pavel ;
Dornevil, Dolnide ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Ioannidis, Vassilios ;
Ivanyi, Ivan ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente ;
Lemercier, Philippe ;
Le Saux, Virginie .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D193-D197
[4]   Protein folding, structure prediction and design [J].
Baker, David .
BIOCHEMICAL SOCIETY TRANSACTIONS, 2014, 42 :225-229
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]   A machine learning information retrieval approach to protein fold recognition [J].
Cheng, Jianlin ;
Baldi, Pierre .
BIOINFORMATICS, 2006, 22 (12) :1456-1463
[7]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[8]   A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks [J].
Eickholt, Jesse ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2013, 14
[9]   DNdisorder: predicting protein disorder using boosting and deep networks [J].
Eickholt, Jesse ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2013, 14
[10]   Predicting protein residue-residue contacts using deep networks and boosting [J].
Eickholt, Jesse ;
Cheng, Jianlin .
BIOINFORMATICS, 2012, 28 (23) :3066-3072