Accelerated Profile HMM Searches

被引:4161
作者
Eddy, Sean R. [1 ]
机构
[1] HHMI Janelia Farm Res Campus, Ashburn, VA USA
关键词
HIDDEN MARKOV-MODELS; PROTEIN DATABASE SEARCHES; PSI-BLAST; SPEED-UP; SIMILARITY; ACCURACY;
D O I
10.1371/journal.pcbi.1002195
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.
引用
收藏
页数:16
相关论文
共 44 条
  • [1] Protein database searches using compositionally adjusted substitution matrices
    Altschul, SF
    Wootton, JC
    Gertz, EM
    Agarwala, R
    Morgulis, A
    Schäffer, AA
    Yu, YK
    [J]. FEBS JOURNAL, 2005, 272 (20) : 5101 - 5109
  • [2] The estimation of statistical parameters for local alignment score distributions
    Altschul, SF
    Bundschuh, R
    Olsen, R
    Hwa, T
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (02) : 351 - 361
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [5] [Anonymous], THESIS WASHINGTON U
  • [6] Ongoing and future developments at the Universal Protein Resource
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Barrell, Daniel
    Bely, Benoit
    Bingley, Mark
    Binns, David
    Bower, Lawrence
    Browne, Paul
    Chan, Wei Mun
    Dimmer, Emily
    Eberhardt, Ruth
    Fazzini, Francesco
    Fedotov, Alexander
    Foulger, Rebecca
    Garavelli, John
    Castro, Leyla Garcia
    Huntley, Rachael
    Jacobsen, Julius
    Kleen, Michael
    Laiho, Kati
    Legge, Duncan
    Lin, Quan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pontikos, Nikolas
    Pruess, Manuela
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Corbett, Matt
    Donnelly, Mike
    van Rensburg, Pieter
    Xenarios, Ioannis
    Bougueleret, Lydie
    Auchincloss, Andrea
    Argoud-Puy, Ghislaine
    Axelsen, Kristian
    Bairoch, Amos
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D214 - D219
  • [7] BLAST plus : architecture and applications
    Camacho, Christiam
    Coulouris, George
    Avagyan, Vahram
    Ma, Ning
    Papadopoulos, Jason
    Bealer, Kevin
    Madden, Thomas L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [8] Chaudhary V., 2006, PARALLEL COMPUTING B, P233
  • [9] SledgeHMMER: a web server for batch searching the Pfam database
    Chukkapalli, G
    Guda, C
    Subramaniam, S
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : W542 - W544
  • [10] Derrien S, 2007, P 18 IEEE INT C APPL