Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics

被引:51
|
作者
Thankaswamy-Kosalai, Subazini [1 ]
Sen, Partho [1 ]
Nookaew, Intawat [1 ,2 ]
机构
[1] Chalmers Univ Technol, Dept Biol & Biol Engn, Kemivagen 10, SE-41296 Gothenburg, Sweden
[2] Univ Arkansas Med Sci, Dept Biomed Informat, Coll Med, Little Rock, AR 72205 USA
关键词
Next-generation sequencing; NGS; Aligners; Alignments; Mapping; Algorithm; Reads; Genome; TANDEM REPEATS;
D O I
10.1016/j.ygeno.2017.03.001
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Massive data produced due to the advent of next-generation sequencing (NGS) technology is widely used for biological researches and medical diagnosis. The crucial step in NGS analysis is read alignment or mapping which is computationally intensive and complex. The mapping bias tends to affect the downstream analysis, including detection of polymorphisms. In order to provide guidelines to the biologist for suitable selection of aligners; we have evaluated and benchmarked 5 different aligners (BWA, Bowtie2, NovoAlign, Smalt and Stampy) and their mapping bias based on characteristics of 5 microbial genomes. Two million simulated read pairs of various sizes (36 bp, 50 bp, 72 bp, 100 bp, 125 bp, 150 bp, 200 bp, 250 bp and 300 bp) were aligned. Specific alignment features such as sensitivity of mapping, percentage of properly paired reads, alignment time and effect of tandem repeats on incorrectly mapped reads were evaluated. BWA showed faster alignment followed by Bowtie2 and Smalt. NovoAlign and Stampy were comparatively slower. Most of the aligners showed high sensitivity towards long reads (> 100 bp) mapping. On the other hand NovoAlign showed higher sensitivity towards both short reads (36 bp, 50 bp, 72 bp) and long reads (> 100 bp) mappings; It also showed higher sensitivity towards mapping a complex genome like Plasmodium falciparum. The percentage of properly paired reads aligned by NovoAlign, BWA and Stampy were markedly higher. None of the aligners outperforms the others in the benchmark, however the aligners perform differently with genome characteristics. We expect that the results from this study will be useful for the end user to choose aligner, thus enhance the accuracy of read mapping. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:186 / 191
页数:6
相关论文
共 50 条
  • [21] Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms
    Jeong, Haeyoung
    Lee, Dae-Hee
    Ryu, Choong-Min
    Park, Seung-Hwan
    JOURNAL OF MICROBIOLOGY AND BIOTECHNOLOGY, 2016, 26 (01) : 207 - 212
  • [22] Comparative genome-wide polymorphic microsatellite markers in Antarctic penguins through next generation sequencing
    Vianna, Juliana A.
    Noll, Daly
    Mura-Jornet, Isidora
    Valenzuela-Guerra, Paulina
    Gonzalez-Acuna, Daniel
    Navarro, Cristell
    Loyola, David E.
    Dantas, Gisele P. M.
    GENETICS AND MOLECULAR BIOLOGY, 2017, 40 (03) : 676 - 687
  • [23] Identification of a genome-wide serum microRNA expression profile as potential noninvasive biomarkers for chronic kidney disease using next-generation sequencing
    Liu, Xinying
    Wang, Weijie
    Bai, Yaling
    Zhang, Huiran
    Zhang, Shenglei
    He, Lei
    Zhou, Wei
    Zhang, Dongxue
    Xu, Jinsheng
    JOURNAL OF INTERNATIONAL MEDICAL RESEARCH, 2020, 48 (12)
  • [24] Unraveling the multiple myeloma genome in the next-generation sequencing era: challenges to translating knowledge into the clinic
    Braggio, Esteban
    Fonseca, Rafael
    EXPERT REVIEW OF HEMATOLOGY, 2011, 4 (06) : 579 - 581
  • [25] First genome survey and repeatome analysis of Chrysopogon zizanioides based on next-generation sequencing
    Shuqiong Yang
    Jibao Chen
    Jun Zhang
    Jiafei Liu
    Jingjing Yu
    Debao Cai
    Lunguang Yao
    Pengfei Duan
    Biologia, 2020, 75 : 1273 - 1282
  • [26] A Nonhomogeneous Hidden Markov Model for Gene Mapping Based on Next-Generation Sequencing Data
    Ghavidel, Fatemeh Zamanzad
    Claesen, Juergen
    Burzykowski, Tomasz
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (02) : 178 - 188
  • [27] Genome-wide single nucleotide polymorphism and Insertion-Deletion discovery through next-generation sequencing of reduced representation libraries in common bean
    Xiaolu Zou
    Chun Shi
    Ryan S. Austin
    Daniele Merico
    Seth Munholland
    Frédéric Marsolais
    Alireza Navabi
    William L. Crosby
    K. Peter Pauls
    Kangfu Yu
    Yuhai Cui
    Molecular Breeding, 2014, 33 : 769 - 778
  • [28] Next-Generation Sequencing-Based Approaches for Mutation Mapping and Identification in Caenorhabditis elegans
    Doitsidou, Maria
    Jarriault, Sophie
    Poole, Richard J.
    GENETICS, 2016, 204 (02) : 451 - 474
  • [29] Genome-wide single nucleotide polymorphism and Insertion-Deletion discovery through next-generation sequencing of reduced representation libraries in common bean
    Zou, Xiaolu
    Shi, Chun
    Austin, Ryan S.
    Merico, Daniele
    Munholland, Seth
    Marsolais, Frederic
    Navabi, Alireza
    Crosby, William L.
    Pauls, K. Peter
    Yu, Kangfu
    Cui, Yuhai
    MOLECULAR BREEDING, 2014, 33 (04) : 769 - 778
  • [30] Diversity Arrays Technology (DArT) and next-generation sequencing combined: genome-wide, high throughput, highly informative genotyping for molecular breeding of Eucalyptus
    Carolina Sansaloni
    Cesar Petroli
    Damian Jaccoud
    Jason Carling
    Frank Detering
    Dario Grattapaglia
    Andrzej Kilian
    BMC Proceedings, 5 (Suppl 7)