Kraken: A set of tools for quality control and analysis Of high-throughput sequence data

被引:293
作者
Davis, Matthew P. A. [1 ]
van Dongen, Stijn [1 ]
Abreu-Goodger, Cei [2 ]
Bartonicek, Nenad [1 ]
Enright, Anton J. [1 ]
机构
[1] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[2] Natl Lab Genom Biodivers Langebio, Guanajuato, Mexico
基金
英国生物技术与生命科学研究理事会;
关键词
Algorithms; Tools; RNAseq; NGS; Next-generation sequencing; Sequencing; Adapter trimming; Pipelines;
D O I
10.1016/j.ymeth.2013.06.027
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
New sequencing technologies pose significant challenges in terms of data complexity and magnitude. It is essential that efficient software is developed with performance that scales with this growth in sequence information. Here we present a comprehensive and integrated set of tools for the analysis of data from large scale sequencing experiments. It supports adapter detection and removal, demultiplexing of barcodes, paired-end data, a range of read architectures and the efficient removal of sequence redundancy. Sequences can be trimmed and filtered based on length, quality and complexity. Quality control plots track sequence length, composition and summary statistics with respect to genomic annotation. Several use cases have been integrated into a single streamlined pipeline, including both mRNA and small RNA sequencing experiments. This pipeline interfaces with existing tools for genomic mapping and differential expression analysis. (C) 2013 The Authors. Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:41 / 49
页数:9
相关论文
共 16 条
[1]   Barcoding bias in high-throughput multiplex sequencing of miRNA [J].
Alon, Shahar ;
Vigneault, Francois ;
Eminaga, Seda ;
Christodoulou, Danos C. ;
Seidman, Jonathan G. ;
Church, George M. ;
Eisenberg, Eli .
GENOME RESEARCH, 2011, 21 (09) :1506-1511
[2]   Ensembl 2011 [J].
Flicek, Paul ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Brent, Simon ;
Chen, Yuan ;
Clapham, Peter ;
Coates, Guy ;
Fairley, Susan ;
Fitzgerald, Stephen ;
Gordon, Leo ;
Hendrix, Maurice ;
Hourlier, Thibaut ;
Johnson, Nathan ;
Kaehaeri, Andreas ;
Keefe, Damian ;
Keenan, Stephen ;
Kinsella, Rhoda ;
Kokocinski, Felix ;
Kulesha, Eugene ;
Larsson, Pontus ;
Longden, Ian ;
McLaren, William ;
Overduin, Bert ;
Pritchard, Bethan ;
Riat, Harpreet Singh ;
Rios, Daniel ;
Ritchie, Graham R. S. ;
Ruffier, Magali ;
Schuster, Michael ;
Sobral, Daniel ;
Spudich, Giulietta ;
Tang, Y. Amy ;
Trevanion, Stephen ;
Vandrovcova, Jana ;
Vilella, Albert J. ;
White, Simon ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Zamora, Jorge ;
Aken, Bronwen L. ;
Birney, Ewan ;
Cunningham, Fiona ;
Dunham, Ian ;
Durbin, Richard ;
Fernandez-Suarez, Xose M. ;
Herrero, Javier ;
Hubbard, Tim J. P. ;
Parker, Anne ;
Proctor, Glenn .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D800-D806
[3]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[4]  
Gunaratne Preethi H, 2012, Methods Mol Biol, V822, P273, DOI 10.1007/978-1-61779-427-8_19
[5]   Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies [J].
Kong, Yong .
GENOMICS, 2011, 98 (02) :152-153
[6]   miRBase: integrating microRNA annotation and deep-sequencing data [J].
Kozomara, Ana ;
Griffiths-Jones, Sam .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D152-D157
[7]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[8]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[9]   AdapterRemoval: Easy cleaning of next-generation sequencing reads [J].
Lindgreen S. .
BMC Research Notes, 5 (1)
[10]   RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics [J].
Lohse, Marc ;
Bolger, Anthony M. ;
Nagel, Axel ;
Fernie, Alisdair R. ;
Lunn, John E. ;
Stitt, Mark ;
Usadel, Bjoern .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W622-W627