Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics

被引:51
|
作者
Thankaswamy-Kosalai, Subazini [1 ]
Sen, Partho [1 ]
Nookaew, Intawat [1 ,2 ]
机构
[1] Chalmers Univ Technol, Dept Biol & Biol Engn, Kemivagen 10, SE-41296 Gothenburg, Sweden
[2] Univ Arkansas Med Sci, Dept Biomed Informat, Coll Med, Little Rock, AR 72205 USA
关键词
Next-generation sequencing; NGS; Aligners; Alignments; Mapping; Algorithm; Reads; Genome; TANDEM REPEATS;
D O I
10.1016/j.ygeno.2017.03.001
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Massive data produced due to the advent of next-generation sequencing (NGS) technology is widely used for biological researches and medical diagnosis. The crucial step in NGS analysis is read alignment or mapping which is computationally intensive and complex. The mapping bias tends to affect the downstream analysis, including detection of polymorphisms. In order to provide guidelines to the biologist for suitable selection of aligners; we have evaluated and benchmarked 5 different aligners (BWA, Bowtie2, NovoAlign, Smalt and Stampy) and their mapping bias based on characteristics of 5 microbial genomes. Two million simulated read pairs of various sizes (36 bp, 50 bp, 72 bp, 100 bp, 125 bp, 150 bp, 200 bp, 250 bp and 300 bp) were aligned. Specific alignment features such as sensitivity of mapping, percentage of properly paired reads, alignment time and effect of tandem repeats on incorrectly mapped reads were evaluated. BWA showed faster alignment followed by Bowtie2 and Smalt. NovoAlign and Stampy were comparatively slower. Most of the aligners showed high sensitivity towards long reads (> 100 bp) mapping. On the other hand NovoAlign showed higher sensitivity towards both short reads (36 bp, 50 bp, 72 bp) and long reads (> 100 bp) mappings; It also showed higher sensitivity towards mapping a complex genome like Plasmodium falciparum. The percentage of properly paired reads aligned by NovoAlign, BWA and Stampy were markedly higher. None of the aligners outperforms the others in the benchmark, however the aligners perform differently with genome characteristics. We expect that the results from this study will be useful for the end user to choose aligner, thus enhance the accuracy of read mapping. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:186 / 191
页数:6
相关论文
共 50 条
  • [1] 3 Evaluation and Comparison of Multiple Aligners for Next-Generation Sequencing Data Analysis
    Shang, Jing
    Zhu, Fei
    Vongsangnak, Wanwipa
    Tang, Yifei
    Zhang, Wenyu
    Shen, Bairong
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [2] Exploring the SSBreakome: genome-wide mapping of DNA single-strand breaks by next-generation sequencing
    Zilio, Nicola
    Ulrich, Helle D.
    FEBS JOURNAL, 2021, 288 (13) : 3948 - 3961
  • [3] Development of genome-wide insertion and deletion markers for maize, based on next-generation sequencing data
    Liu, Jian
    Qu, Jingtao
    Yang, Cong
    Tang, Dengguo
    Li, Jingwei
    Lan, Hai
    Rong, Tingzhao
    BMC GENOMICS, 2015, 16
  • [4] Development of genome-wide insertion and deletion markers for maize, based on next-generation sequencing data
    Jian Liu
    Jingtao Qu
    Cong Yang
    Dengguo Tang
    Jingwei Li
    Hai Lan
    Tingzhao Rong
    BMC Genomics, 16
  • [5] Evaluation of next-generation sequencing software in mapping and assembly
    Suying Bao
    Rui Jiang
    WingKeung Kwan
    BinBin Wang
    Xu Ma
    You-Qiang Song
    Journal of Human Genetics, 2011, 56 : 406 - 414
  • [6] Mapping DNA Topoisomerase Binding and Cleavage Genome Wide Using Next-Generation Sequencing Techniques
    McKie, Shannon J.
    Maxwell, Anthony
    Neuman, Keir C.
    GENES, 2020, 11 (01)
  • [7] SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data
    Bai, Yongsheng
    Cavalcoli, James
    BIOINFORMATION, 2013, 9 (17) : 870 - 872
  • [8] First genome-wide CNV mapping in FELIS CATUS using next generation sequencing data
    F. Genova
    M. Longeri
    L. A. Lyons
    A. Bagnato
    M. G. Strillacci
    BMC Genomics, 19
  • [9] First genome-wide CNV mapping in FELIS CATUS using next generation sequencing data
    Genova, F.
    Longeri, M.
    Lyons, L. A.
    Bagnato, A.
    Strillacci, M. G.
    BMC GENOMICS, 2018, 19
  • [10] Genome-Wide Identification of Insertion and Deletion Markers in Chinese Commercial Rice Cultivars, Based on Next-Generation Sequencing Data
    Markkandan, Kesavan
    Yoo, Seung-il
    Cho, Young-Chan
    Lee, Dong Woo
    AGRONOMY-BASEL, 2018, 8 (04):