PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

被引:4
作者
Hong, Changjin [1 ,2 ]
Manimaran, Solaiappan [1 ]
Johnson, William [1 ]
机构
[1] Boston Univ, Sch Med, Div Computat Biomed, Boston, MA 02215 USA
[2] Nationwide Childrens Hosp, Cytogenet Mol Genet Lab, Columbus, OH 43205 USA
基金
美国国家卫生研究院;
关键词
sequencing read preprocessing; sequencing quality control; parallel processing; metagenomics;
D O I
10.4137/CIN.S13890
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/.
引用
收藏
页码:167 / 176
页数:10
相关论文
共 46 条
[1]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]   High-throughput decoding of antitrypanosomal drug efficacy and resistance [J].
Alsford, Sam ;
Eckert, Sabine ;
Baker, Nicola ;
Glover, Lucy ;
Sanchez-Flores, Alejandro ;
Leung, Ka Fai ;
Turner, Daniel J. ;
Field, Mark C. ;
Berriman, Matthew ;
Horn, David .
NATURE, 2012, 482 (7384) :232-U125
[3]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[4]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[5]   Pharmacogenomics and Individualized Medicine: Translating Science Into Practice [J].
Crews, K. R. ;
Hicks, J. K. ;
Pui, C-H ;
Relling, M. V. ;
Evans, W. E. .
CLINICAL PHARMACOLOGY & THERAPEUTICS, 2012, 92 (04) :467-475
[6]   An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis [J].
Del Fabbro, Cristian ;
Scalabrin, Simone ;
Morgante, Michele ;
Giorgi, Federico M. .
PLOS ONE, 2013, 8 (12)
[7]   Pathoscope: Species identification and strain attribution with unassembled sequencing data [J].
Francis, Owen E. ;
Bendall, Matthew ;
Manimaran, Solaiappan ;
Hong, Changjin ;
Clement, Nathan L. ;
Castro-Nallar, Eduardo ;
Snell, Quinn ;
Schaalje, G. Bruce ;
Clement, Mark J. ;
Crandall, Keith A. ;
Johnson, W. Evan .
GENOME RESEARCH, 2013, 23 (10) :1721-1729
[8]   Modeling the next generation sequencing sample processing pipeline for the purposes of classification [J].
Ghaffari, Noushin ;
Yousefi, Mohammadmahdi R. ;
Johnson, Charles D. ;
Ivanov, Ivan ;
Dougherty, Edward R. .
BMC BIOINFORMATICS, 2013, 14
[9]  
Gusfield D, 1997, ALGORITHMS STRINGS T
[10]   De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis [J].
Haas, Brian J. ;
Papanicolaou, Alexie ;
Yassour, Moran ;
Grabherr, Manfred ;
Blood, Philip D. ;
Bowden, Joshua ;
Couger, Matthew Brian ;
Eccles, David ;
Li, Bo ;
Lieber, Matthias ;
MacManes, Matthew D. ;
Ott, Michael ;
Orvis, Joshua ;
Pochet, Nathalie ;
Strozzi, Francesco ;
Weeks, Nathan ;
Westerman, Rick ;
William, Thomas ;
Dewey, Colin N. ;
Henschel, Robert ;
Leduc, Richard D. ;
Friedman, Nir ;
Regev, Aviv .
NATURE PROTOCOLS, 2013, 8 (08) :1494-1512