Magic-BLAST, an accurate RNA-seq aligner for long and short reads

被引:205
作者
Boratyn, Grzegorz M. [1 ]
Thierry-Mieg, Jean [1 ]
Thierry-Mieg, Danielle [1 ]
Busby, Ben [1 ]
Madden, Thomas L. [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, 8600 Rockville Pike, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
RNA-seq; BLAST; Alignment; GENERATION; ALIGNMENT;
D O I
10.1186/s12859-019-2996-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundNext-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline.ResultsMagic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome.ConclusionsWe show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.
引用
收藏
页数:19
相关论文
共 22 条
[11]   High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing [J].
Lagarde, Julien ;
Uszczynska-Ratajczak, Barbara ;
Carbonell, Silvia ;
Perez-Lluch, Silvia ;
Abad, Amaya ;
Davis, Carrie ;
Gingeras, Thomas R. ;
Frankish, Adam ;
Harrow, Jennifer ;
Guigo, Roderic ;
Johnson, Rory .
NATURE GENETICS, 2017, 49 (12) :1731-+
[12]   Minimap2: pairwise alignment for nucleotide sequences [J].
Li, Heng .
BIOINFORMATICS, 2018, 34 (18) :3094-3100
[13]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[14]   The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote [J].
Liao, Yang ;
Smyth, Gordon K. ;
Shi, Wei .
NUCLEIC ACIDS RESEARCH, 2013, 41 (10) :e108
[15]   Review of alignment and SNP calling algorithms for next-generation sequencing data [J].
Mielczarek, M. ;
Szyda, J. .
JOURNAL OF APPLIED GENETICS, 2016, 57 (01) :71-79
[16]   Mapping and quantifying mammalian transcriptomes by RNA-Seq [J].
Mortazavi, Ali ;
Williams, Brian A. ;
McCue, Kenneth ;
Schaeffer, Lorian ;
Wold, Barbara .
NATURE METHODS, 2008, 5 (07) :621-628
[17]   Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation [J].
O'Leary, Nuala A. ;
Wright, Mathew W. ;
Brister, J. Rodney ;
Ciufo, Stacy ;
McVeigh, Diana Haddad Rich ;
Rajput, Bhanu ;
Robbertse, Barbara ;
Smith-White, Brian ;
Ako-Adjei, Danso ;
Astashyn, Alexander ;
Badretdin, Azat ;
Bao, Yiming ;
Blinkova, Olga ;
Brover, Vyacheslav ;
Chetvernin, Vyacheslav ;
Choi, Jinna ;
Cox, Eric ;
Ermolaeva, Olga ;
Farrell, Catherine M. ;
Goldfarb, Tamara ;
Gupta, Tripti ;
Haft, Daniel ;
Hatcher, Eneida ;
Hlavina, Wratko ;
Joardar, Vinita S. ;
Kodali, Vamsi K. ;
Li, Wenjun ;
Maglott, Donna ;
Masterson, Patrick ;
McGarvey, Kelly M. ;
Murphy, Michael R. ;
O'Neill, Kathleen ;
Pujar, Shashikant ;
Rangwala, Sanjida H. ;
Rausch, Daniel ;
Riddick, Lillian D. ;
Schoch, Conrad ;
Shkeda, Andrei ;
Storz, Susan S. ;
Sun, Hanzhen ;
Thibaud-Nissen, Francoise ;
Tolstoy, Igor ;
Tully, Raymond E. ;
Vatsan, Anjana R. ;
Wallin, Craig ;
Webb, David ;
Wu, Wendy ;
Landrum, Melissa J. ;
Kimchi, Avi ;
Tatusova, Tatiana .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D733-D745
[18]   Alignment of Next-Generation Sequencing Reads [J].
Reinert, Knut ;
Langmead, Ben ;
Weese, David ;
Evers, Dirk J. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 16, 2015, 16 :133-151
[19]   Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis [J].
Sahraeian, Sayed Mohammad Ebrahim ;
Mohiyuddin, Marghoob ;
Sebra, Robert ;
Tilgner, Hagen ;
Afshar, Pegah T. ;
Au, Kin Fai ;
Asadi, Narges Bani ;
Gerstein, Mark B. ;
Wong, Wing Hung ;
Snyder, Michael P. ;
Schadt, Eric ;
Lam, Hugo Y. K. .
NATURE COMMUNICATIONS, 2017, 8
[20]   A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium [J].
Su, Zhenqiang ;
Labaj, Pawel P. ;
Li, Sheng ;
Thierry-Mieg, Jean ;
Thierry-Mieg, Danielle ;
Shi, Wei ;
Wang, Charles ;
Schroth, Gary P. ;
Setterquist, Robert A. ;
Thompson, John F. ;
Jones, Wendell D. ;
Xiao, Wenzhong ;
Xu, Weihong ;
Jensen, Roderick V. ;
Kelly, Reagan ;
Xu, Joshua ;
Conesa, Ana ;
Furlanello, Cesare ;
Gao, Hanlin ;
Hong, Huixiao ;
Jafari, Nadereh ;
Letovsky, Stan ;
Liao, Yang ;
Lu, Fei ;
Oakeley, Edward J. ;
Peng, Zhiyu ;
Praul, Craig A. ;
Santoyo-Lopez, Javier ;
Scherer, Andreas ;
Shi, Tieliu ;
Smyth, Gordon K. ;
Staedtler, Frank ;
Sykacek, Peter ;
Tan, Xin-Xing ;
Thompson, E. Aubrey ;
Vandesompele, Jo ;
Wang, May D. ;
Wang, Jian ;
Wolfinger, Russell D. ;
Zavadil, Jiri ;
Auerbach, Scott S. ;
Bao, Wenjun ;
Binder, Hans ;
Blomquist, Thomas ;
Brilliant, Murray H. ;
Bushel, Pierre R. ;
Cain, Weimin ;
Catalano, Jennifer G. ;
Chang, Ching-Wei ;
Chen, Tao .
NATURE BIOTECHNOLOGY, 2014, 32 (09) :903-914