Multi-perspective quality control of Illumina exome sequencing data using QC3

被引:76
作者
Guo, Yan [1 ]
Zhao, Shilin [1 ]
Sheng, Quanhu [1 ]
Ye, Fei [1 ]
Li, Jiang [1 ]
Lehmann, Brian [3 ]
Pietenpol, Jennifer [3 ]
Samuels, David C. [2 ]
Shyr, Yu [1 ]
机构
[1] Vanderbilt Ingram Canc Ctr, Ctr Quantitat Sci, Nashville, TN 37232 USA
[2] Vanderbilt Univ, Med Ctr, Ctr Human Genet Res, Nashville, TN USA
[3] Vanderbilt Univ, Dept Biochem, Nashville, TN 37027 USA
关键词
Quality control; Exome sequencing; Raw data; Alignment; Variant call; DISCOVERY; VARIANTS; FORMAT;
D O I
10.1016/j.ygeno.2014.03.006
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Advances in next-generation sequencing (NGS) technologies have greatly improved our ability to detect genomic variants for biomedical research. The advance in NGS technologies has also created significant challenges in bioinformatics. One of the major challenges is the quality control of sequencing data. There has been heavy focus on performing raw data quality control. In order to correctly interpret the quality of the DNA sequencing data, however, proper quality control should be conducted at all stages of DNA sequencing data analysis: raw data, alignment, and variant detection. We designed QC3, a quality control tool aimed at those three major stages of DNA sequencing. QC3 monitors quality control metrics at each stage of NGS data and provides unique and independent evaluations of the data quality from different perspectives. QC3 offers unique features such as detection of batch effect and cross contamination. QC3 and its source code are freely downloadable at https://github.com/slzhao/QC3. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:323 / 328
页数:6
相关论文
共 25 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities [J].
Bainbridge, Matthew N. ;
Wang, Min ;
Wu, Yuanqing ;
Newsham, Irene ;
Muzny, Donna M. ;
Jefferies, John L. ;
Albert, Thomas J. ;
Burgess, Daniel L. ;
Gibbs, Richard A. .
GENOME BIOLOGY, 2011, 12 (07)
[3]   Exome sequencing as a tool for Mendelian disease gene discovery [J].
Bamshad, Michael J. ;
Ng, Sarah B. ;
Bigham, Abigail W. ;
Tabor, Holly K. ;
Emond, Mary J. ;
Nickerson, Deborah A. ;
Shendure, Jay .
NATURE REVIEWS GENETICS, 2011, 12 (11) :745-755
[4]  
Bioinformatics B., FASTQ SCREEN
[5]   The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[6]   A COMPARATIVE-STUDY OF TESTS FOR HOMOGENEITY OF VARIANCES, WITH APPLICATIONS TO THE OUTER CONTINENTAL-SHELF BIDDING DATA [J].
CONOVER, WJ ;
JOHNSON, ME ;
JOHNSON, MM .
TECHNOMETRICS, 1981, 23 (04) :351-361
[7]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[8]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[9]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[10]  
Guo Y., 2013, Briefings in Bioinformatics