Statistical Mitogenome Assembly with RepeaTs

被引:7
作者
Alqahtani, Fahad [1 ,2 ]
Mandoiu, Ion I. [1 ]
机构
[1] Univ Connecticut, Comp Sci & Engn Dept, 371 Fairfield Way,Unit 4155, Storrs, CT 06269 USA
[2] King Abdulaziz City Sci & Technol, Natl Ctr Artificial Intelligence & Big Data Techn, Riyadh, Saudi Arabia
基金
美国国家科学基金会;
关键词
bootstrapping; de novo assembly; maximum likelihood; mitogenome; repeats; MULTIPLE SEQUENCE ALIGNMENT; COPY NUMBER; GENOME; ACCURACY;
D O I
10.1089/cmb.2019.0505
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
By using next-generation sequencing technologies, it is possible to quickly and inexpensively generate large numbers of relatively short reads from both the nuclear and mitochondrial DNA (mtDNA) contained in a biological sample. Unfortunately, assembling such whole-genome sequencing (WGS) data with standard de novo assemblers often fails to generate high-quality mitochondrial genome sequences due to the large difference in copy number (and hence sequencing depth) between the mitochondrial and nuclear genomes. Assembly of complete mitochondrial genome sequences is further complicated by the fact that many de novo assemblers are not designed for circular genomes and by the presence of repeats in the mitochondrial genomes of some species. In this article, we describe the Statistical Mitogenome Assembly with RepeaTs (SMART) pipeline for automated assembly of mitochondrial genomes from WGS data. SMART uses an efficient coverage-based filter to first select a subset of reads enriched in mtDNA sequences. Contigs produced by an initial assembly step are filtered using the Basic Local Alignment Search Tool searches against a comprehensive mitochondrial genome database and are used as "baits" for an alignment-based filter that produces the set of reads used in a second de novo assembly and scaffolding step. In the presence of repeats, the possible paths through the assembly graph are evaluated using a maximum likelihood model. Additionally, the assembly process is repeated for a user-specified number of times on resampled subsets of reads to select for annotation of the reconstructed sequences with highest bootstrap support. Experiments on WGS data sets from a variety of species show that the SMART pipeline produces complete circular mitochondrial genome sequences with a higher success rate than current state-of-the-art tools, particularly for low-coverage WGS data sets.
引用
收藏
页码:1407 / 1421
页数:15
相关论文
共 42 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[2]   Variations in mouse mitochondrial DNA copy number from fertilization to birth are associated with oxidative stress [J].
Aiken, Catherine E. M. ;
Cindrova-Davies, Tereza ;
Johnson, Martin H. .
REPRODUCTIVE BIOMEDICINE ONLINE, 2008, 17 (06) :806-813
[3]   Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data [J].
Al-Nakeeb, Kosai ;
Petersen, Thomas Nordahl ;
Sicheritz-Ponten, Thomas .
BMC BIOINFORMATICS, 2017, 18
[4]   The ancestry of Brazilian mtDNA lineages [J].
Alves-Silva, J ;
Santos, MD ;
Guimaraes, PEM ;
Ferreira, ACS ;
Bandelt, HJ ;
Pena, SDJ ;
Prado, VF .
AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (02) :444-461
[5]  
Andrews S, 2010, FastQC: A quality control tool for high throughput sequence data
[6]  
[Anonymous], 2015, Bioinformatics, DOI [10.1101/018333, DOI 10.1101/018333]
[7]   Plasmid detection and assembly in genomic and metagenomic data sets [J].
Antipov, Dmitry ;
Raiko, Mikhail ;
Lapidus, Alla ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2019, 29 (06) :961-968
[8]   plasmidSPAdes: assembling plasmids from whole genome sequencing data [J].
Antipov, Dmitry ;
Hartwick, Nolan ;
Shen, Max ;
Raiko, Mikhail ;
Lapidus, Alla ;
Pevzner, Pavel A. .
BIOINFORMATICS, 2016, 32 (22) :3380-3387
[9]   MARS: improving multiple circular sequence alignment using refined sequences [J].
Ayad, Lorraine A. K. ;
Pissis, Solon P. .
BMC GENOMICS, 2017, 18
[10]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477