Measurement of the effectiveness of transitive sequence comparison, through a third 'intermediate' sequence

被引：41

作者：

Gerstein, M ^{[1
]}

机构：

[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA

来源：

BIOINFORMATICS | 1998年 / 14卷 / 08期

关键词：

D O I：

10.1093/bioinformatics/14.8.707

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Transitive sequence matching expands the scope of sequence comparison by re-running the results of a given query against the databank as a new query. This sometimes results in the initial query sequence (Q) being related to a final match (M) indirectly, through a third, 'intermediate' sequence (Q --> I --> M). This approach has often been suggested as providing greater sensitivity in sequence comparison; however; it has not yet been possible to gauge its improvement precisely. Results: Here, this improvement is comprehensively measured by seeing what fraction of the known structural relationships transitive sequence matching can uncover beyond that found by normal pairwise comparison (i.e. direct linkage). The structural relationships are taken from a well-characterized test set, the scop classification of protein structure. Specifically, 2055 known structural similarities (called 'pairs') between distantly related proteins constitute the basic test set. To make the measurement of transitive matching properly, special data sets, called 'baseline sets', are derived from this. They consist of pairs of sequences that have a clear structural relationship that cannot be found by normal sequence comparison tie. they cannot be directly linked). Specifically, using standard sequence comparison protocols (FASTA with an e-value cut-off of 0.001), it is found that the baseline set consists of 1742 pairs. A third intermediate sequence can link 86 of these indirectly (5%), where this third sequence is drawn from the entire, current universe of protein sequences. The number of false positives is minimal. Furthermore, when one considers only the relationships within the test set that correspond to a close structural alignment, the coverage increases considerably. In particular 862 of the baseline set pairs fit to better than 2.6 Angstrom RMS, and transitive matching can find 62 of these (9%).

引用

页码：707 / 714

页数：8

共 52 条

[1] Do aligned sequences share the same fold? [J].