Summarizing and correcting the GC content bias in high-throughput sequencing

被引：598

作者：

Benjamini, Yuval ^{[1
]}

Speed, Terence P. ^{[1
,2
]}

机构：

[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[2] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3052, Australia

来源：

NUCLEIC ACIDS RESEARCH | 2012年 / 40卷 / 10期

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

HUMAN GENOME; ILLUMINA; ALIGNMENT;

D O I：

10.1093/nar/gks001

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. We analyze regularities in the GC bias patterns, and find a compact description for this unimodal curve family. It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias. We propose a model that produces predictions at the base pair level, allowing strand-specific GC-effect correction regardless of the downstream smoothing or binning. These GC modeling considerations can inform other high-throughput sequencing analyses such as ChIP-seq and RNA-seq.

引用

页数：14

共 50 条

[1] High-throughput sequencing for biology and medicine
Soon, Wendy Weijia
Hariharan, Manoj
Snyder, Michael P.
MOLECULAR SYSTEMS BIOLOGY, 2013, 9
[2] Exploring plant transcriptomes using ultra high-throughput sequencing
Wang, Lin
Li, Pinghua
Brutnell, Thomas P.
BRIEFINGS IN FUNCTIONAL GENOMICS, 2010, 9 (02) : 118 - 128
[3] Nonoverlapping Clone Pooling for High-Throughput Sequencing
Kuroshu, Reginaldo M.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (05) : 1091 - 1097
[4] Detection of Pathogens Via High-Throughput Sequencing
Khan, Akbar S.
EMERGING AND ENDEMIC PATHOGENS: ADVANCES IN SURVEILLANCE, DETECTION AND IDENTIFICATION, 2010, : 119 - 123
[5] TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data
Fimereli, Danai
Detours, Vincent
Konopka, Tomasz
NUCLEIC ACIDS RESEARCH, 2013, 41 (07) : e86
[6] Effects of error, chimera, bias, and GC content on the accuracy of amplicon sequencing
Qin, Yujia
Wu, Liyou
Zhang, Qiuting
Wen, Chongqin
Van Nostrand, Joy D.
Ning, Daliang
Raskin, Lutgarde
Pinto, Ameet
Zhou, Jizhong
MSYSTEMS, 2023, 8 (06)
[7] Mapping DNA methylation with high-throughput nanopore sequencing
Rand, Arthur C.
Jain, Miten
Eizenga, Jordan M.
Musselman-Brown, Audrey
Olsen, Hugh E.
Akeson, Mark
Paten, Benedict
NATURE METHODS, 2017, 14 (04) : 411 - +
[8] High-throughput sequencing of cytosine methylation in plant DNA
Hardcastle, Thomas J.
PLANT METHODS, 2013, 9
[9] High-throughput sequencing (HTS) for the analysis of viral populations
Perez-Losada, Marcos
Arenas, Miguel
Carlos Galan, Juan
Alma Bracho, Ma
Hillung, Julia
Garcia-Gonzalez, Neris
Gonzalez-Candelas, Fernando
INFECTION GENETICS AND EVOLUTION, 2020, 80
[10] Metagenomic study of the oral microbiota by Illumina high-throughput sequencing
Lazarevic, Vladimir
Whiteson, Katrine
Huse, Susan
Hernandez, David
Farinelli, Laurent
Osteras, Magne
Schrenzel, Jacques
Francois, Patrice
JOURNAL OF MICROBIOLOGICAL METHODS, 2009, 79 (03) : 266 - 271

← 1 2 3 4 5 →