Background: The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N-2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results: In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions: We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cole, J. R.
Wang, Q.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Wang, Q.
Cardenas, E.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cardenas, E.
Fish, J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Fish, J.
Chai, B.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Chai, B.
Farris, R. J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Farris, R. J.
Kulam-Syed-Mohideen, A. S.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Kulam-Syed-Mohideen, A. S.
McGarrell, D. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
McGarrell, D. M.
论文数: 引用数:
h-index:
机构:
Marsh, T.
Garrity, G. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Garrity, G. M.
Tiedje, J. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cole, J. R.
Wang, Q.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Wang, Q.
Cardenas, E.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Cardenas, E.
Fish, J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Fish, J.
Chai, B.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Chai, B.
Farris, R. J.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Farris, R. J.
Kulam-Syed-Mohideen, A. S.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Kulam-Syed-Mohideen, A. S.
McGarrell, D. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
McGarrell, D. M.
论文数: 引用数:
h-index:
机构:
Marsh, T.
Garrity, G. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Garrity, G. M.
Tiedje, J. M.
论文数: 0引用数: 0
h-index: 0
机构:
Michigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA
Michigan State Univ, Dept Microbiol & Mol Genet, E Lansing, MI 48824 USAMichigan State Univ, Ctr Microbial Ecol, E Lansing, MI 48824 USA