A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments

被引:25
作者
Bansal, Vikas [1 ]
机构
[1] Univ Calif San Diego, Sch Med, Dept Pediat, 9500 Gilman Dr, La Jolla, CA 92093 USA
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
PCR duplicates; High-throughput sequencing; Mathematical modeling; Heterozygosity; RNA-seq; Natural duplicates; EXOME CAPTURE; MOLECULES; DISCOVERY; BIAS;
D O I
10.1186/s12859-017-1471-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. Results: In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. Conclusions: The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates.
引用
收藏
页数:11
相关论文
共 27 条
[1]   Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition [J].
Adey, Andrew ;
Morrison, Hilary G. ;
Asan ;
Xun, Xu ;
Kitzman, Jacob O. ;
Turner, Emily H. ;
Stackhouse, Bethany ;
MacKenzie, Alexandra P. ;
Caruccio, Nicholas C. ;
Zhang, Xiuqing ;
Shendure, Jay .
GENOME BIOLOGY, 2010, 11 (12)
[2]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[3]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[4]   Whole exome capture in solution with 3 Gbp of data [J].
Bainbridge, Matthew N. ;
Wang, Min ;
Burgess, Daniel L. ;
Kovar, Christie ;
Rodesch, Matthew J. ;
D'Ascenzo, Mark ;
Kitzman, Jacob ;
Wu, Yuan-Qing ;
Newsham, Irene ;
Richmond, Todd A. ;
Jeddeloh, Jeffrey A. ;
Muzny, Donna ;
Albert, Thomas J. ;
Gibbs, Richard A. .
GENOME BIOLOGY, 2010, 11 (06)
[5]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]  
Bronner IraadF., 2013, Current protocols in human genetics, V79, P18, DOI [DOI 10.1002/0471142905.HG1802S79, 10.1002/0471142905.hg1802s80, DOI 10.1002/0471142905.HG1802S80]
[7]   A method for counting PCR template molecules with application to next-generation sequencing [J].
Casbon, James A. ;
Osborne, Robert J. ;
Brenner, Sydney ;
Lichtenstein, Conrad P. .
NUCLEIC ACIDS RESEARCH, 2011, 39 (12) :e81
[8]  
Chen YW, 2012, NAT METHODS, V9, P609, DOI [10.1038/nmeth.1985, 10.1038/NMETH.1985]
[9]   Performance comparison of four exome capture systems for deep sequencing [J].
Chilamakuri, Chandra Sekhar Reddy ;
Lorenz, Susanne ;
Madoui, Mohammed-Amin ;
Vodak, Daniel ;
Sun, Jinchang ;
Hovig, Eivind ;
Myklebost, Ola ;
Meza-Zepeda, Leonardo A. .
BMC GENOMICS, 2014, 15
[10]   Predicting the molecular complexity of sequencing libraries [J].
Daley, Timothy ;
Smith, Andrew D. .
NATURE METHODS, 2013, 10 (04) :325-+