SeqAn An efficient, generic C++ library for sequence analysis

被引:210
作者
Doering, Andreas [1 ]
Weese, David [1 ]
Rausch, Tobias [1 ,2 ]
Reinert, Knut
机构
[1] Inst Informat, D-14195 Berlin, Germany
[2] Int Max Planck Res Sch Computat Biol & Sci Comp, D-14195 Berlin, Germany
关键词
D O I
10.1186/1471-2105-9-11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-heart algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion: We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.
引用
收藏
页数:9
相关论文
共 43 条
  • [1] Abouelhoda M. I., 2002, String Processing and Information Retrieval. 9th International Symposium, SPIRE 2002. Proceedings (Lecture Notes in Computer Science Vol.2476), P31
  • [2] Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
  • [3] Abouelhoda MI, 2003, LECT NOTES COMPUT SC, V2676, P1
  • [4] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [5] [Anonymous], IMPERFECT C PRACTICA
  • [6] [Anonymous], 2002, FLEXIBLE PATTERN MAT
  • [7] [Anonymous], NCBI C TOOLKIT
  • [8] [Anonymous], [No title captured], DOI DOI 10.1145/299432.299460
  • [9] [Anonymous], LECT NOTES COMPUTER
  • [10] [Anonymous], 1998, Generic Programming and the STL