Improved gapped alignment in BLAST

被引:75
作者
Cameron, M [1 ]
Williams, HE [1 ]
Cannane, A [1 ]
机构
[1] RMIT Univ, Sch Comp Sci & Informat Technol, Melbourne, Vic 3001, Australia
基金
澳大利亚研究理事会;
关键词
sequence alignment; BLAST; dynamic programming; homology search;
D O I
10.1109/TCBB.2004.32
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step-semigapped alignment-compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic-restricted insertion alignment-that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in BLAST. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.
引用
收藏
页码:116 / 129
页数:14
相关论文
共 38 条
  • [31] Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
    Park, J
    Karplus, K
    Barrett, C
    Hughey, R
    Haussler, D
    Hubbard, T
    Chothia, C
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (04) : 1201 - 1210
  • [32] PEARSON WR, 1992, METHOD ENZYMOL, V210, P575
  • [33] IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON
    PEARSON, WR
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) : 2444 - 2448
  • [34] Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements
    Schäffer, AA
    Aravind, L
    Madden, TL
    Shavirin, S
    Spouge, JL
    Wolf, YI
    Koonin, EV
    Altschul, SF
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (14) : 2994 - 3005
  • [35] IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES
    SMITH, TF
    WATERMAN, MS
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) : 195 - 197
  • [36] RAPID SIMILARITY SEARCHES OF NUCLEIC-ACID AND PROTEIN DATA BANKS
    WILBUR, WJ
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1983, 80 (03): : 726 - 730
  • [37] Alignments without low-scoring regions
    Zhang, Z
    Berman, P
    Miller, W
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) : 197 - 210
  • [38] Aligning a DNA sequence with a protein sequence
    Zhang, Z
    Pearson, WR
    Miller, W
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) : 339 - 349