Adaptive seeds tame genomic sequence comparison

被引:908
作者
Kielbasa, Szymon M. [2 ]
Wan, Raymond [1 ]
Sato, Kengo [3 ]
Horton, Paul [1 ]
Frith, Martin C. [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Koto Ku, Tokyo 1350064, Japan
[2] Max Planck Inst Mol Genet, Dept Computat Biol, D-14195 Berlin, Germany
[3] Univ Tokyo, Grad Sch Frontier Sci, Chiba 2778561, Japan
关键词
GENERATION; REPEATS; BLAST;
D O I
10.1101/gr.113985.110
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo) nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e. g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.
引用
收藏
页码:487 / 493
页数:7
相关论文
共 33 条
[1]  
Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], THESIS PENNSYLVANIA
[4]  
[Anonymous], WORKSH KNOWL LANG LE
[5]  
[Anonymous], 1997, ACM SIGACT NEWS
[6]   Alu repeats and human genomic diversity [J].
Batzer, MA ;
Deininger, PL .
NATURE REVIEWS GENETICS, 2002, 3 (05) :370-379
[7]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[8]  
Carlton J, 2005, CURR ISSUES MOL BIOL, V7, P23
[9]  
Csürös M, 2004, LECT NOTES COMPUT SC, V3109, P373
[10]   The regulated retrotransposon transcriptome of mammalian cells [J].
Faulkner, Geoffrey J. ;
Kimura, Yasumasa ;
Daub, Carsten O. ;
Wani, Shivangi ;
Plessy, Charles ;
Irvine, Katharine M. ;
Schroder, Kate ;
Cloonan, Nicole ;
Steptoe, Anita L. ;
Lassmann, Timo ;
Waki, Kazunori ;
Hornig, Nadine ;
Arakawa, Takahiro ;
Takahashi, Hazuki ;
Kawai, Jun ;
Forrest, Alistair R. R. ;
Suzuki, Harukazu ;
Hayashizaki, Yoshihide ;
Hume, David A. ;
Orlando, Valerio ;
Grimmond, Sean M. ;
Carninci, Piero .
NATURE GENETICS, 2009, 41 (05) :563-571