YAKUSA: A fast structural database scanning method

被引：51

作者：

Carpentier, M ^{[1
]}

Brouillet, S ^{[1
]}

Pothier, J ^{[1
]}

机构：

[1] Univ Paris 06, Atelier BioInformat, F-75005 Paris, France

来源：

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS | 2005年 / 61卷 / 01期

关键词：

protein structural similarities; protein internal coordinates; mixture transition distribution model;

D O I：

10.1002/prot.20517

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure. It searches for the longest common substructures called SHSPs (structural high-scoring pairs) existing between a query structure and every structure in the structural database. It makes use of protein backbone internal coordinates (alpha angles) in order to describe protein structures as sequences of symbols. The structural similarities are established in 5 steps, the first 3 being analogous to those used in BLAST: (1) building up a deterministic finite automaton describing all patterns identical or similar to those in the query structure; (2) searching for all these patterns in every structure in the database; (3) extending the patterns to longer matching substructures (i.e., SHSPs); (4) selecting compatible SHSPs for each query-database structure pair; and (5) ranking the query- database structure pairs using 3 scores based on SHSP similarity, on SHSP probabilities, and on spatial compatibility of SHSPs. Structural fragment probabilities are estimated according to a mixture transition distribution model, which is an approximation of a high-order Markov chain model. With regard to sensitivity and selectivity of the structural matches, YAKUSA compares well to the best related programs, although it is by far faster: A typical database scan takes about 40 s CPU time on a desktop personal computer. It has also been implemented on a Web server for real-time searches.

引用

页码：137 / 151

页数：15

共 58 条

[1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].

AHO, AV ;

CORASICK, MJ .

COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340

[2] BASIC LOCAL ALIGNMENT SEARCH TOOL [J].

ALTSCHUL, SF ;

GISH, W ;

MILLER, W ;

MYERS, EW ;

LIPMAN, DJ .

JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410

[3] SCOP database in 2004: refinements integrate structure and sequence family data [J].

Andreeva, A ;

Howorth, D ;

Brenner, SE ;

Hubbard, TJP ;

Chothia, C ;

Murzin, AG .

NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229

[4] NEW TYPE OF REPRESENTATION FOR MAPPING CHAIN-FOLDING IN PROTEIN MOLECULES [J].

BALASUBRAMANIAN, R .

NATURE, 1977, 266 (5605) :856-857

[5]

BERCHOLD A, 1999, 360 U WASH

[6] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].

BERNSTEIN, FC ;

KOETZLE, TF ;

WILLIAMS, GJB ;

MEYER, EF ;

BRICE, MD ;

RODGERS, JR ;

KENNARD, O ;

SHIMANOUCHI, T ;

TASUMI, M .

EUROPEAN JOURNAL OF BIOCHEMISTRY, 1977, 80 (02) :319-324

[7] The ASTRAL compendium for protein structure and sequence analysis [J].

Brenner, SE ;

Koehl, P ;

Levitt, R .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256

[8]

CARPENTIER M, 2003, P EUR C COMP BIOL PA, P171

[9] Protein fold similarity estimated by a probabilistic approach based on Cα-Cα distance comparison [J].

Carugo, O ;

Pongor, S .

JOURNAL OF MOLECULAR BIOLOGY, 2002, 315 (04) :887-898

[10] Pairwise and multiple identification of three-dimensional common substructures in proteins [J].

Escalier, V ;

Pothier, J ;

Soldano, H ;

Viari, A .

JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (01) :41-56

← 1 2 3 4 5 6 →