ODiNPred: comprehensive prediction of protein order and disorder

被引:61
作者
Dass, Rupashree [1 ]
Mulder, Frans A. A. [1 ,2 ]
Nielsen, Jakob Toudahl [1 ,2 ]
机构
[1] Aarhus Univ, Interdisciplinary Nanosci Ctr iNANO, Gustav Wieds Vej 14, DK-8000 Aarhus C, Denmark
[2] Aarhus Univ, Dept Chem, Langelandsgade 140, DK-8000 Aarhus C, Denmark
关键词
TRANSACTIVATION DOMAIN INTERACTION; PRION PROTEIN; WEB SERVER; STRUCTURAL-CHARACTERIZATION; SECONDARY STRUCTURE; INTRINSIC DISORDER; BINDING REGIONS; TERMINAL DOMAIN; PHASE-SEPARATION; CHEMICAL-SHIFTS;
D O I
10.1038/s41598-020-71716-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.
引用
收藏
页数:16
相关论文
共 125 条
[91]   POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins [J].
Nielsen, Jakob Toudahl ;
Mulder, Frans A. A. .
JOURNAL OF BIOMOLECULAR NMR, 2018, 70 (03) :141-165
[92]   Coupled folding and binding with α-helix-forming molecular recognition elements [J].
Oldfield, CJ ;
Cheng, YG ;
Cortese, MS ;
Romero, P ;
Uversky, VN ;
Dunker, AK .
BIOCHEMISTRY, 2005, 44 (37) :12454-12470
[93]   Mobi 2.0: an improved method to define intrinsic disorder, mobility and linear binding regions in protein structures [J].
Piovesan, Damiano ;
Tosatto, Silvio C. E. .
BIOINFORMATICS, 2018, 34 (01) :122-123
[94]   FELLS: fast estimator of latent local structure [J].
Piovesan, Damiano ;
Walsh, Ian ;
Minervini, Giovanni ;
Tosatto, Silvio C. E. .
BIOINFORMATICS, 2017, 33 (12) :1889-1891
[95]   FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments [J].
Price, Morgan N. ;
Dehal, Paramvir S. ;
Arkin, Adam P. .
PLOS ONE, 2010, 5 (03)
[96]   FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix [J].
Price, Morgan N. ;
Dehal, Paramvir S. ;
Arkin, Adam P. .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (07) :1641-1650
[97]   Prions [J].
Prusiner, SB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (23) :13363-13383
[98]   ELM server:: a new resource for investigating short functional sites in modular eukaryotic proteins [J].
Puntervoll, P ;
Linding, R ;
Gemünd, C ;
Chabanis-Davidson, S ;
Mattingsdal, M ;
Cameron, S ;
Martin, DMA ;
Ausiello, G ;
Brannetti, B ;
Costantini, A ;
Ferrè, F ;
Maselli, V ;
Via, A ;
Cesareni, G ;
Diella, F ;
Superti-Furga, G ;
Wyrwicz, L ;
Ramu, C ;
McGuigan, C ;
Gudavalli, R ;
Letunic, I ;
Bork, P ;
Rychlewski, L ;
Küster, B ;
Helmer-Citterich, M ;
Hunter, WN ;
Aasland, R ;
Gibson, TJ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3625-3630
[99]  
Rabdano S., BACKBONE 1H 13C 15N
[100]  
Romero Pedro, 2004, Appl Bioinformatics, V3, P105, DOI 10.2165/00822942-200403020-00005