Within the twilight zone: A sensitive profile-profile comparison tool based on information theory

被引:203
作者
Yona, G [1 ]
Levitt, M [1 ]
机构
[1] Stanford Univ, Dept Biol Struct, Stanford, CA 94305 USA
关键词
profile-profile comparison; PSI-BLAST; structural alignment; remote homologies;
D O I
10.1006/jmbi.2001.5293
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
This paper presents a novel approach to profile-profile comparison. The method compares two input profiles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our profile-profile comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the profile-profile alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is significantly more sensitive in detecting distant homologies than the popular profile-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:1257 / 1275
页数:19
相关论文
共 57 条
[31]  
Lyngso R B, 1999, Proc Int Conf Intell Syst Mol Biol, P178
[32]   Protein folds and functions [J].
Martin, AC ;
Orengo, CA ;
Hutchinson, EG ;
Jones, S ;
Karmirantzou, M ;
Laskowski, RA ;
Mitchell, JB ;
Taroni, C ;
Thornton, JM .
STRUCTURE, 1998, 6 (07) :875-884
[33]   OB(OLIGONUCLEOTIDE OLIGOSACCHARIDE BINDING)-FOLD - COMMON STRUCTURAL AND FUNCTIONAL SOLUTION FOR NONHOMOLOGOUS SEQUENCES [J].
MURZIN, AG .
EMBO JOURNAL, 1993, 12 (03) :861-867
[34]  
MURZIN AG, 1995, J MOL BIOL, V247, P536, DOI 10.1016/S0022-2836(05)80134-2
[35]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[36]   CATH - a hierarchic classification of protein domain structures [J].
Orengo, CA ;
Michie, AD ;
Jones, S ;
Jones, DT ;
Swindells, MB ;
Thornton, JM .
STRUCTURE, 1997, 5 (08) :1093-1108
[37]   Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods [J].
Park, J ;
Karplus, K ;
Barrett, C ;
Hughey, R ;
Haussler, D ;
Hubbard, T ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (04) :1201-1210
[38]  
Pearson WR, 1997, COMPUT APPL BIOSCI, V13, P325
[39]   Empirical statistical estimates for sequence similarity searches [J].
Pearson, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (01) :71-84
[40]   Searching databases of conserved sequence regions by aligning protein multiple-alignments [J].
Pietrokovski, S .
NUCLEIC ACIDS RESEARCH, 1996, 24 (19) :3836-3845