YAKUSA: A fast structural database scanning method

被引:51
作者
Carpentier, M [1 ]
Brouillet, S [1 ]
Pothier, J [1 ]
机构
[1] Univ Paris 06, Atelier BioInformat, F-75005 Paris, France
关键词
protein structural similarities; protein internal coordinates; mixture transition distribution model;
D O I
10.1002/prot.20517
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure. It searches for the longest common substructures called SHSPs (structural high-scoring pairs) existing between a query structure and every structure in the structural database. It makes use of protein backbone internal coordinates (alpha angles) in order to describe protein structures as sequences of symbols. The structural similarities are established in 5 steps, the first 3 being analogous to those used in BLAST: (1) building up a deterministic finite automaton describing all patterns identical or similar to those in the query structure; (2) searching for all these patterns in every structure in the database; (3) extending the patterns to longer matching substructures (i.e., SHSPs); (4) selecting compatible SHSPs for each query-database structure pair; and (5) ranking the query- database structure pairs using 3 scores based on SHSP similarity, on SHSP probabilities, and on spatial compatibility of SHSPs. Structural fragment probabilities are estimated according to a mixture transition distribution model, which is an approximation of a high-order Markov chain model. With regard to sensitivity and selectivity of the structural matches, YAKUSA compares well to the best related programs, although it is by far faster: A typical database scan takes about 40 s CPU time on a desktop personal computer. It has also been implemented on a Web server for real-time searches.
引用
收藏
页码:137 / 151
页数:15
相关论文
共 58 条
  • [1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] SCOP database in 2004: refinements integrate structure and sequence family data
    Andreeva, A
    Howorth, D
    Brenner, SE
    Hubbard, TJP
    Chothia, C
    Murzin, AG
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D226 - D229
  • [4] NEW TYPE OF REPRESENTATION FOR MAPPING CHAIN-FOLDING IN PROTEIN MOLECULES
    BALASUBRAMANIAN, R
    [J]. NATURE, 1977, 266 (5605) : 856 - 857
  • [5] BERCHOLD A, 1999, 360 U WASH
  • [6] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. EUROPEAN JOURNAL OF BIOCHEMISTRY, 1977, 80 (02): : 319 - 324
  • [7] The ASTRAL compendium for protein structure and sequence analysis
    Brenner, SE
    Koehl, P
    Levitt, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 254 - 256
  • [8] CARPENTIER M, 2003, P EUR C COMP BIOL PA, P171
  • [9] Protein fold similarity estimated by a probabilistic approach based on Cα-Cα distance comparison
    Carugo, O
    Pongor, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 315 (04) : 887 - 898
  • [10] Pairwise and multiple identification of three-dimensional common substructures in proteins
    Escalier, V
    Pothier, J
    Soldano, H
    Viari, A
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (01) : 41 - 56