How significant is a protein structure similarity with TM-score=0.5?

被引:620
作者
Xu, Jinrui [1 ,2 ]
Zhang, Yang [1 ,2 ]
机构
[1] Univ Michigan, Ctr Computat Med & Bioinformat, Dept Med Sch, Ann Arbor, MI 48109 USA
[2] Univ Kansas, Ctr Bioinformat, Dept Mol Biosci, Lawrence, KS 66047 USA
基金
美国国家科学基金会;
关键词
TEMPLATE FREE TARGETS; STRUCTURE PREDICTIONS; CLASSIFICATION; DATABASE; QUALITY; CATH; SCOP; DIVERGENCE; MODELS; ORIGIN;
D O I
10.1093/bioinformatics/btq066
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? Results: We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5x10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score = 0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i. e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.
引用
收藏
页码:889 / 895
页数:7
相关论文
共 35 条
[1]   Data growth and its impact on the SCOP database: new developments [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chandonia, John-Marc ;
Brenner, Steven E. ;
Hubbard, Tim J. P. ;
Chothia, Cyrus ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D419-D425
[2]   Assessment of CASP8 structure predictions for template free targets [J].
Ben-David, Moshe ;
Noivirt-Brik, Orly ;
Paz, Aviv ;
Prilusky, Jaime ;
Sussman, Joel L. ;
Levy, Yaakov .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 :50-65
[3]   The Protein Data Bank [J].
Berman, HM ;
Battistuz, T ;
Bhat, TN ;
Bluhm, WF ;
Bourne, PE ;
Burkhardt, K ;
Iype, L ;
Jain, S ;
Fagan, P ;
Marvin, J ;
Padilla, D ;
Ravichandran, V ;
Schneider, B ;
Thanki, N ;
Weissig, H ;
Westbrook, JD ;
Zardecki, C .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 :899-907
[4]  
Betancourt MR, 2001, BIOPOLYMERS, V59, P305, DOI 10.1002/1097-0282(20011015)59:5<305::AID-BIP1027>3.0.CO
[5]  
2-6
[6]   Evolution of the protein repertoire [J].
Chothia, C ;
Gough, J ;
Vogel, C ;
Teichmann, SA .
SCIENCE, 2003, 300 (5626) :1701-1703
[7]   The CATH classification revisited-architectures reviewed and new ways to characterize structural divergence in superfamilies [J].
Cuff, Alison L. ;
Sillitoe, Ian ;
Lewis, Tony ;
Redfern, Oliver C. ;
Garratt, Richard ;
Thornton, Janet ;
Orengo, Christine A. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D310-D314
[8]   Expanding protein universe and its origin from the biological Big Bang [J].
Dokholyan, NV ;
Shakhnovich, B ;
Shakhnovich, EI .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) :14132-14136
[9]  
Embrechts P., 2008, Modelling Extremal Events
[10]   CAFASP3: The third critical assessment of fully automated structure prediction methods [J].
Fischer, D ;
Rychlewski, L ;
Dunbrack, RL ;
Ortiz, AR ;
Elofsson, A .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :503-516