Benchmarking short sequence mapping tools

被引:122
作者
Hatem, Ayat [1 ,2 ]
Bozdag, Doruk [2 ]
Toland, Amanda E. [3 ]
Catalyuerek, Uemit V. [1 ,2 ]
机构
[1] Ohio State Univ, Dept Elect & Comp Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[3] Ohio State Univ, Dept Mol Virol Immunol & Med Genet, Columbus, OH 43210 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
Short sequence mapping; Next-generation sequencing; Benchmark; Sequence analysis; SHORT-READ ALIGNMENT; GENOME; ALGORITHMS; ULTRAFAST; ACCURACY;
D O I
10.1186/1471-2105-14-184
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results: We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion: The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.
引用
收藏
页数:25
相关论文
共 44 条
  • [1] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [2] Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming
    Blom, Jochen
    Jakobi, Tobias
    Doppmeier, Daniel
    Jaenicke, Sebastian
    Kalinowski, Joern
    Stoye, Jens
    Goesmann, Alexander
    [J]. BIOINFORMATICS, 2011, 27 (10) : 1351 - 1358
  • [3] Burrows M, 1994, BLOCK SORTING LOSSLE
  • [4] PASS: a program to align short sequences
    Campagna, Davide
    Albiero, Alessandro
    Bilardi, Alessandra
    Caniato, Elisa
    Forcato, Claudio
    Manavski, Svetlin
    Vitulo, Nicola
    Valle, Giorgio
    [J]. BIOINFORMATICS, 2009, 25 (07) : 967 - 968
  • [5] Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning
    Cokus, Shawn J.
    Feng, Suhua
    Zhang, Xiaoyu
    Chen, Zugen
    Merriman, Barry
    Haudenschild, Christian D.
    Pradhan, Sriharsa
    Nelson, Stanley F.
    Pellegrini, Matteo
    Jacobsen, Steven E.
    [J]. NATURE, 2008, 452 (7184) : 215 - 219
  • [6] Intron-exon structures of eukaryotic model organisms
    Deutsch, M
    Long, M
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (15) : 3219 - 3228
  • [7] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [8] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [9] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185
  • [10] Opportunistic data structures with applications
    Ferragina, P
    Manzini, G
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 390 - 398