Indexcov: fast coverage quality control for whole-genome sequencing

被引:37
作者
Pedersen, Brent S. [1 ,3 ]
Collins, Ryan L. [4 ,6 ,7 ,8 ]
Talkowski, Michael E. [4 ,5 ,6 ,7 ,8 ]
Quinlan, Aaron R. [1 ,2 ,3 ]
机构
[1] Univ Utah, Dept Human Genet, 15 S 2030 E, Salt Lake City, UT 84112 USA
[2] Univ Utah, Dept Biomed Informat, 421 Wakara Way 140, Salt Lake City, UT 84108 USA
[3] Univ Utah, USTAR Ctr Genet Discovery, 15 S 2030 E, Salt Lake City, UT 84112 USA
[4] Massachusetts Gen Hosp, Ctr Genom Med, 55 Fruit St, Boston, MA 02114 USA
[5] Harvard Med Sch, Dept Neurol, A-111,25 Shattuck St, Boston, MA 02115 USA
[6] Broad Inst, Program Med & Populat Genet, 415 Main St, Cambridge, MA 02142 USA
[7] Broad Inst, Stanley Ctr Psychiat Res, 415 Main St, Cambridge, MA 02142 USA
[8] Harvard Med Sch, Div Med Sci, Program Bioinformat & Integrat Genom, A-111,25 Shattuck St, Boston, MA 02115 USA
关键词
FEATURES;
D O I
10.1093/gigascience/gix090
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license.
引用
收藏
页数:6
相关论文
共 15 条
[1]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping [J].
Alkan, Can ;
Coe, Bradley P. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2011, 12 (05) :363-375
[2]   Summarizing and correcting the GC content bias in high-throughput sequencing [J].
Benjamini, Yuval ;
Speed, Terence P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (10) :e72
[3]  
Kortschak RD, 2017, J OPEN SOURCE SOFTW
[4]   LUMPY: a probabilistic framework for structural variant discovery [J].
Layer, Ryan M. ;
Chiang, Colby ;
Quinlan, Aaron R. ;
Hall, Ira M. .
GENOME BIOLOGY, 2014, 15 (06)
[5]   Dosage effects of X and Y chromosomes on language and social functioning in children with supernumerary sex chromosome aneuploidies: implications for idiopathic language impairment and autism spectrum disorders [J].
Lee, Nancy Raitano ;
Wallace, Gregory L. ;
Adeyemi, Elizabeth I. ;
Lopez, Katherine C. ;
Blumenthal, Jonathan D. ;
Clasen, Liv S. ;
Giedd, Jay N. .
JOURNAL OF CHILD PSYCHOLOGY AND PSYCHIATRY, 2012, 53 (10) :1072-1081
[6]   Tackling the widespread and critical impact of batch effects in high-throughput data [J].
Leek, Jeffrey T. ;
Scharpf, Robert B. ;
Bravo, Hector Corrada ;
Simcha, David ;
Langmead, Benjamin ;
Johnson, W. Evan ;
Geman, Donald ;
Baggerly, Keith ;
Irizarry, Rafael A. .
NATURE REVIEWS GENETICS, 2010, 11 (10) :733-739
[7]   Toward better understanding of artifacts in variant calling from high-coverage samples [J].
Li, Heng .
BIOINFORMATICS, 2014, 30 (20) :2843-2851
[8]   Tabix: fast retrieval of sequence features from generic TAB-delimited files [J].
Li, Heng .
BIOINFORMATICS, 2011, 27 (05) :718-719
[9]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[10]   Quantifying single nucleotide variant detection sensitivity in exome sequencing [J].
Meynert, Alison M. ;
Bicknell, Louise S. ;
Hurles, Matthew E. ;
Jackson, Andrew P. ;
Taylor, Martin S. .
BMC BIOINFORMATICS, 2013, 14