共 3 条
Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
被引:5
|作者:
Van den Berge, Koen
[1
,2
,3
,11
]
Chou, Hsin-Jung
[4
,12
]
de Bezieux, Hector Roux
[5
,6
]
Street, Kelly
[7
,8
]
Risso, Davide
[9
]
Ngai, John
[4
,10
,13
]
Dudoit, Sandrine
[1
,5
,6
]
机构:
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Univ Ghent, Dept Appl Math Comp Sci & Stat, Ghent, Belgium
[3] Univ Ghent, Bioinformat Inst Ghent, Ghent, Belgium
[4] Univ Calif Berkeley, Dept Mol & Cell Biol, 229 Stanley Hall, Berkeley, CA 94720 USA
[5] Univ Calif Berkeley, Sch Publ Hlth, Div Biostat, Berkeley, CA 94720 USA
[6] Univ Calif Berkeley, Ctr Computat Biol, Berkeley, CA 94720 USA
[7] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[8] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[9] Univ Padua, Dept Stat Sci, Padua, Italy
[10] Univ Calif Berkeley, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA
[11] Johnson & Johnson, Stat & Decis Sci, Janssen Pharmaceut Co, Beerse, Belgium
[12] Audentes Therapeut Inc, San Francisco, CA USA
[13] NINDS, NIH, Bldg 36,Rm 4D04, Bethesda, MD 20892 USA
来源:
CELL REPORTS METHODS
|
2022年
/
2卷
/
11期
基金:
美国国家卫生研究院;
关键词:
REDUCES SYSTEMATIC-ERRORS;
OPEN CHROMATIN;
CHIP-SEQ;
BIOCONDUCTOR PACKAGE;
BIAS;
REPRODUCIBILITY;
LANDSCAPE;
GENES;
D O I:
10.1016/j.crmeth.2022.100321
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset.
引用
收藏
页数:21
相关论文