Benchmarking of TASSER in the ab initio limit

被引:5
作者
Borreguero, Jose M. [1 ]
Skolnick, Jeffrey [1 ]
机构
[1] Georgia Inst Technol, Sch Biochem, Ctr Study Syst Biol, Atlanta, GA 30318 USA
关键词
ab initio folding; protein folding; protein structure prediction;
D O I
10.1002/prot.21392
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt a, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score >= 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.
引用
收藏
页码:48 / 56
页数:9
相关论文
共 29 条
[1]  
Betancourt MR, 2001, BIOPOLYMERS, V59, P305, DOI 10.1002/1097-0282(20011015)59:5<305::AID-BIP1027>3.3.CO
[2]  
2-Y
[3]   JPred: a consensus secondary structure prediction server [J].
Cuff, JA ;
Clamp, ME ;
Siddiqui, AS ;
Finlay, M ;
Barton, GJ .
BIOINFORMATICS, 1998, 14 (10) :892-893
[4]   Mapping the protein universe [J].
Holm, L ;
Sander, C .
SCIENCE, 1996, 273 (5275) :595-602
[5]   SCOP: a structural classification of proteins database [J].
Hubbard, TJP ;
Ailey, B ;
Brenner, SE ;
Murzin, AG ;
Chothia, C .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :254-256
[6]   DISCUSSION OF SOLUTION FOR BEST ROTATION TO RELATE 2 SETS OF VECTORS [J].
KABSCH, W .
ACTA CRYSTALLOGRAPHICA SECTION A, 1978, 34 (SEP) :827-828
[7]   A structure-based method for derivation of all-atom potentials for protein folding [J].
Kussell, E ;
Shimada, J ;
Shakhnovich, EI .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (08) :5343-5348
[8]   Geometric cooperativity and anticooperativity of three-body interactions in native proteins [J].
Li, X ;
Liang, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (01) :46-65
[9]   The PSIPRED protein structure prediction server [J].
McGuffin, LJ ;
Bryson, K ;
Jones, DT .
BIOINFORMATICS, 2000, 16 (04) :404-405
[10]   Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: Assessment in two blind tests [J].
Oldziej, S ;
Czaplewski, C ;
Liwo, A ;
Chinchio, M ;
Nanias, M ;
Vila, JA ;
Khalili, M ;
Arnautova, YA ;
Jagielska, A ;
Makowski, M ;
Schafroth, HD ;
Kazmierkiewicz, R ;
Ripoll, DR ;
Pillardy, J ;
Saunders, JA ;
Kang, YK ;
Gibson, KD ;
Scheraga, HA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (21) :7547-7552