Analysis of superfamily specific profile-profile recognition accuracy

被引:5
作者
Casbon, JA [1 ]
Saqi, MAS [1 ]
机构
[1] Univ London, Queen Marys Sch Med & Dent, Inst Cell & Mol Sci, Ctr Infect Dis,Bioinformat Grp, London E1 2AA, England
关键词
D O I
10.1186/1471-2105-5-200
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily. Results: The performance varies greatly between superfamilies with the truncated receiver operating characteristic, ROC10, varying from 0.95 down to 0.01. These large differences persist even when the profiles are trimmed to approximately the same level of diversity. Conclusions: Although the number of sequences in the profile (profile width) and degree of sequence variation within positions in the profile (profile diversity) contribute to accurate detection there are other superfamily specific factors.
引用
收藏
页数:9
相关论文
共 15 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] CASBON J, 2005, IN PRESS NUCL ACIDS
  • [3] ASTRAL compendium enhancements
    Chandonia, JM
    Walker, NS
    Conte, LL
    Koehl, P
    Levitt, M
    Brenner, SE
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 260 - 263
  • [4] MURZIN AG, 1995, J MOL BIOL, V247, P536, DOI 10.1016/S0022-2836(05)80134-2
  • [5] T-Coffee: A novel method for fast and accurate multiple sequence alignment
    Notredame, C
    Higgins, DG
    Heringa, J
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) : 205 - 217
  • [6] Finding weak similarities between proteins by sequence profile comparison
    Panchenko, AR
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (02) : 683 - 689
  • [7] COMPASS: A tool for comparison of multiple protein alignments with assessment of statistical significance
    Sadreyev, R
    Grishin, N
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2003, 326 (01) : 317 - 336
  • [8] Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs
    Sadreyev, RI
    Grishin, NV
    [J]. BIOINFORMATICS, 2004, 20 (06) : 818 - 828
  • [9] Profile-profile comparisons by COMPASS predict intricate homologies between protein families
    Sadreyev, RI
    Baker, D
    Grishin, NV
    [J]. PROTEIN SCIENCE, 2003, 12 (10) : 2262 - 2272
  • [10] Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements
    Schäffer, AA
    Aravind, L
    Madden, TL
    Shavirin, S
    Spouge, JL
    Wolf, YI
    Koonin, EV
    Altschul, SF
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (14) : 2994 - 3005