BlackOPs: increasing confidence in variant detection through mappability filtering

被引:16
作者
Cabanski, Christopher R. [1 ,2 ]
Wilkerson, Matthew D. [3 ,4 ]
Soloway, Matthew [3 ]
Parker, Joel S. [3 ,4 ]
Liu, Jinze [5 ]
Prins, Jan F. [6 ]
Marron, J. S. [1 ,3 ]
Perou, Charles M. [3 ,4 ]
Hayes, D. Neil [3 ,7 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
[2] Washington Univ, Genome Inst, St Louis, MO 63108 USA
[3] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
[4] Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
[5] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[6] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC 27599 USA
[7] Univ N Carolina, Dept Internal Med, Div Med Oncol, Chapel Hill, NC 27599 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
TERT PROMOTER MUTATIONS; CELL LUNG-CANCER; ALIGNMENT; EXPRESSION; GENOME; GENERATION; DISCOVERY; CAPTURE; EXOME; DBSNP;
D O I
10.1093/nar/gkt692
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
引用
收藏
页数:10
相关论文
共 36 条
[1]   Accurate identification of A-to-I RNA editing in human by transcriptome sequencing [J].
Bahn, Jae Hoon ;
Lee, Jae-Hyung ;
Li, Gang ;
Greer, Christopher ;
Peng, Guangdun ;
Xiao, Xinshu .
GENOME RESEARCH, 2012, 22 (01) :142-150
[2]   VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R [J].
Chen, Hanbo ;
Boutros, Paul C. .
BMC BIOINFORMATICS, 2011, 12
[3]   Genetic diagnosis by whole exome capture and massively parallel DNA sequencing [J].
Choi, Murim ;
Scholl, Ute I. ;
Ji, Weizhen ;
Liu, Tiewen ;
Tikhonova, Irina R. ;
Zumbo, Paul ;
Nayir, Ahmet ;
Bakkaloglu, Aysin ;
Ozen, Seza ;
Sanjad, Sami ;
Nelson-Williams, Carol ;
Farhi, Anita ;
Mane, Shrikant ;
Lifton, Richard P. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (45) :19096-19101
[4]   Screening the human exome: a comparison of whole genome and whole transcriptome sequencing [J].
Cirulli, Elizabeth T. ;
Singh, Abanish ;
Shianna, Kevin V. ;
Ge, Dongliang ;
Smith, Jason P. ;
Maia, Jessica M. ;
Heinzen, Erin L. ;
Goedert, James J. ;
Goldstein, David B. .
GENOME BIOLOGY, 2010, 11 (05)
[5]   U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line [J].
Clark, Michael James ;
Homer, Nils ;
O'Connor, Brian D. ;
Chen, Zugen ;
Eskin, Ascia ;
Lee, Hane ;
Merriman, Barry ;
Nelson, Stanley F. .
PLOS GENETICS, 2010, 6 (01)
[6]   Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data [J].
Degner, Jacob F. ;
Marioni, John C. ;
Pai, Athma A. ;
Pickrell, Joseph K. ;
Nkadori, Everlyne ;
Gilad, Yoav ;
Pritchard, Jonathan K. .
BIOINFORMATICS, 2009, 25 (24) :3207-3212
[7]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[8]   Fast Computation and Applications of Genome Mappability [J].
Derrien, Thomas ;
Estelle, Jordi ;
Marco Sola, Santiago ;
Knowles, David G. ;
Raineri, Emanuele ;
Guigo, Roderic ;
Ribeca, Paolo .
PLOS ONE, 2012, 7 (01)
[9]   COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer [J].
Forbes, Simon A. ;
Bindal, Nidhi ;
Bamford, Sally ;
Cole, Charlotte ;
Kok, Chai Yin ;
Beare, David ;
Jia, Mingming ;
Shepherd, Rebecca ;
Leung, Kenric ;
Menzies, Andrew ;
Teague, Jon W. ;
Campbell, Peter J. ;
Stratton, Michael R. ;
Futreal, P. Andrew .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D945-D950
[10]   Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers [J].
Govindan, Ramaswamy ;
Ding, Li ;
Griffith, Malachi ;
Subramanian, Janakiraman ;
Dees, Nathan D. ;
Kanchi, Krishna L. ;
Maher, Christopher A. ;
Fulton, Robert ;
Fulton, Lucinda ;
Wallis, John ;
Chen, Ken ;
Walker, Jason ;
McDonald, Sandra ;
Bose, Ron ;
Ornitz, David ;
Xiong, Donghai ;
You, Ming ;
Dooling, David J. ;
Watson, Mark ;
Mardis, Elaine R. ;
Wilson, Richard K. .
CELL, 2012, 150 (06) :1121-1134