A cancer cell-line titration series for evaluating somatic classification

被引:10
作者
Denroche R.E. [1 ]
Mullen L. [1 ]
Timms L. [1 ]
Beck T. [1 ]
Yung C.K. [1 ]
Stein L. [1 ,2 ]
McPherson J.D. [1 ,3 ]
Brown A.M.K. [1 ]
机构
[1] Ontario Institute for Cancer Research, Toronto, ON
[2] Department of Molecular Genetics, University of Toronto, Toronto, ON
[3] Department of Medical Biophysics, University of Toronto, Toronto, ON
关键词
Cancer bioinformatics; Normal contamination; Somatic mutation calling; Tumour cellularity; Whole exome sequencing dataset;
D O I
10.1186/s13104-015-1803-7
中图分类号
学科分类号
摘要
Background: Accurate detection of somatic single nucleotide variants and small insertions and deletions from DNA sequencing experiments of tumour-normal pairs is a challenging task. Tumour samples are often contaminated with normal cells confounding the available evidence for the somatic variants. Furthermore, tumours are heterogeneous so sub-clonal variants are observed at reduced allele frequencies. We present here a cell-line titration series dataset that can be used to evaluate somatic variant calling pipelines with the goal of reliably calling true somatic mutations at low allele frequencies. Results: Cell-line DNA was mixed with matched normal DNA at 8 different ratios to generate samples with known tumour cellularities, and exome sequenced on Illumina HiSeq to depths of >300×. The data was processed with several different variant calling pipelines and verification experiments were performed to assay >1500 somatic variant candidates using Ion Torrent PGM as an orthogonal technology. By examining the variants called at varying cellularities and depths of coverage, we show that the best performing pipelines are able to maintain a high level of precision at any cellularity. In addition, we estimate the number of true somatic variants undetected as cellularity and coverage decrease. Conclusions: Our cell-line titration series dataset, along with the associated verification results, was effective for this evaluation and will serve as a valuable dataset for future somatic calling algorithm development. The data is available for further analysis at the European Genome-phenome Archive under accession number EGAS00001001016. Data access requires registration through the International Cancer Genome Consortium's Data Access Compliance Office (ICGC DACO). © 2015 Denroche et al.
引用
收藏
相关论文
共 23 条
[11]  
Wang Q., Jia P., Li F., Chen C., Ji H., Hucks D., Et al., Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers, Genome Med., (2013)
[12]  
Ewing A.D., Cancer Genomics Hub TCGA Mutation Calling Benchmark 4 Datasets - UC Santa Cruz, (2013)
[13]  
Boutros P.C., Ewing A.D., Ellrott K., Norman T.C., Dang K.K., Hu Y., Global optimization of somatic variant identification in cancer genomes with a global community challenge, Nat Genet, 46, pp. 318-319, (2014)
[14]  
Li H., Durbin R., Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, pp. 1754-1760, (2009)
[15]  
Novocraft Technologies Sdn Bhd, (2014)
[16]  
DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Et al., A framework for variation discovery and genotyping using next-generation DNA sequencing data, Nat Genet, 43, pp. 491-498, (2011)
[17]  
Alioto T.S., Derdak S., Beck T.A., Boutros P.C., Bower L., Buchhalter I., Et al., A Comprehensive Assessment of Somatic Mutation Calling in Cancer Genomes, BioRxiv, (2014)
[18]  
Costello M., Pugh T.J., Fennell T.J., Stewart C., Lichtenstein L., Meldrim J.C., Et al., Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation, Nucleic Acids Res., (2013)
[19]  
Conway T., Wazny J., Bromage A., Tymms M., Sooraj D., Williams E.D., Et al., Xenome - a tool for classifying reads from xenograft samples, Bioinformatics, 28, pp. i172-i178, (2012)
[20]  
EMBL-EBI European Genome-phenome Archive, (2015)