Short Read Mapping: An Algorithmic Tour

被引:48
作者
Canzar, Stefan [1 ,2 ]
Salzberg, Steven L. [3 ,4 ,5 ,6 ]
机构
[1] Johns Hopkins Univ, Sch Med, Ctr Computat Biol, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
[2] Toyota Technol Inst, Chicago, IL USA
[3] Johns Hopkins Univ, Ctr Computat Biol, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
[4] Johns Hopkins Univ, Inst Med Genet, Baltimore, MD 21205 USA
[5] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
[6] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
基金
美国国家卫生研究院;
关键词
Burrows-Wheeler transform; DNA sequencing; sequence alignment; string matching; suffix trees; BASIC LOCAL ALIGNMENT; LARGE GENOMES; DE-NOVO; SEQUENCING READS; ACCURATE; SEARCH; GENERATION; FASTER; ALIGNER; TOOL;
D O I
10.1109/JPROC.2015.2455551
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Ultra-high-throughput next-generation sequencing (NGS) technology allows us to determine the sequence of nucleotides of many millions of DNA molecules in parallel. Accompanied by a dramatic reduction in cost since its introduction in 2004, NGS technology has provided a new way of addressing a wide range of biological and biomedical questions, from the study of human genetic disease to the analysis of gene expression, protein-DNA interactions, and patterns of DNA methylation. The data generated by NGS instruments comprise huge numbers of very short DNA sequences, or ``reads,'' that carry little information by themselves. These reads therefore have to be pieced together by well-engineered algorithms to reconstruct biologically meaningful measurements, such as the level of expression of a gene. To solve this complex, high-dimensional puzzle, reads must be mapped back to a reference genome to determine their origin. Due to sequencing errors and to genuine differences between the reference genome and the individual being sequenced, this mapping process must be tolerant of mismatches, insertions, and deletions. Although optimal alignment algorithms to solve this problem have long been available, the practical requirements of aligning hundreds of millions of short reads to the 3-billion-base-pair-long human genome have stimulated the development of new, more efficient methods, which today are used routinely throughout the world for the analysis of NGS data.
引用
收藏
页码:436 / 458
页数:23
相关论文
共 118 条
  • [111] The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling
    Tuupanen, Sari
    Turunen, Mikko
    Lehtonen, Rainer
    Hallikas, Outi
    Vanharanta, Sakari
    Kivioja, Teemu
    Bjorklund, Mikael
    Wei, Gonghong
    Yan, Jian
    Niittymaki, Iina
    Mecklin, Jukka-Pekka
    Jarvinen, Heikki
    Ristimaki, Ari
    Di-Bernardo, Mariachiara
    East, Phil
    Carvajal-Carmona, Luis
    Houlston, Richard S.
    Tomlinson, Ian
    Palin, Kimmo
    Ukkonen, Esko
    Karhu, Auli
    Taipale, Jussi
    Aaltonen, Lauri A.
    [J]. NATURE GENETICS, 2009, 41 (08) : 885 - U37
  • [112] Ukkonen E., 1993, Combinatorial Pattern Matching. 4th Annual Symposium, CPM 93 Proceedings, P228, DOI 10.1007/BFb0029808
  • [113] Short read DNA fragment anchoring algorithm
    Wang, Wendi
    Zhang, Peiheng
    Liu, Xinchun
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [114] RazerS 3: Faster, fully sensitive read mapping
    Weese, David
    Holtgrewe, Manuel
    Reinert, Knut
    [J]. BIOINFORMATICS, 2012, 28 (20) : 2592 - 2599
  • [115] RazerS-fast read mapping with sensitivity control
    Weese, David
    Emde, Anne-Katrin
    Rausch, Tobias
    Doering, Andreas
    Reinert, Knut
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1646 - 1654
  • [116] RAPID SIMILARITY SEARCHES OF NUCLEIC-ACID AND PROTEIN DATA BANKS
    WILBUR, WJ
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1983, 80 (03): : 726 - 730
  • [117] Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space
    Wilton, Richard
    Budavari, Tamas
    Langmead, Ben
    Wheelan, Sarah J.
    Salzberg, Steven L.
    Szalay, Alexander S.
    [J]. PEERJ, 2015, 3
  • [118] Accelerating read mapping with FastHASH
    Xin, Hongyi
    Lee, Donghyuk
    Hormozdiari, Farhad
    Yedkar, Samihan
    Mutlu, Onur
    Alkan, Can
    [J]. BMC GENOMICS, 2013, 14