Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

被引：117

作者：

Kaul, Arya ^{[1
,4
]}

Bhattacharyya, Sourya ^{[2
]}

Ay, Ferhat ^{[2
,3
]}

机构：

[1] Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA

[2] La Jolla Inst Immunol, Div Vaccine Discovery, La Jolla, CA 92037 USA

[3] Univ Calif San Diego, Sch Med, La Jolla, CA 92093 USA

[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA

来源：

NATURE PROTOCOLS | 2020年 / 15卷 / 03期

关键词：

REVEALS; GENOME; ORGANIZATION; PRINCIPLES; MODEL; MAP;

D O I：

10.1038/s41596-019-0273-0

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes 12 h with preprocessing when all use cases are run sequentially (4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index. Fit-Hi-C is a computational tool for identifying statistically significant contacts from Hi-C data. This protocol describes how to apply the new version, called FitHiC2, on high-resolution Hi-C data, demonstrating the added functionalities.

引用

页码：991 / 1012

页数：22

共 50 条

[41] HiC-ACT: improved detection of chromatin interactions from Hi-C data via aggregated Cauchy test
Lagler, Taylor M.
Abnousi, Armen
Hu, Ming
Yang, Yuchen
Li, Yun
AMERICAN JOURNAL OF HUMAN GENETICS, 2021, 108 (02) : 257 - 268
[42] Inferring Radial Organization of Chromosomal Territories from HI-C Data
Das, Priyojit
Sanders, Jacob T.
Shen, Tongye
McCord, Rachel P.
BIOPHYSICAL JOURNAL, 2020, 118 (03) : 549A - 549A
[43] Graph-Based Genome Inference from Hi-C Data
Shen, Yihang
Yu, Lingge
Qiu, Yutong
Zhang, Tianyu
Kingsford, Carl
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 115 - 130
[44] Translocation detection from Hi-C data via scan statistics
Cheng, Anthony
Mao, Disheng
Zhang, Yuping
Glaz, Joseph
Ouyang, Zhengqing
BIOMETRICS, 2023, 79 (02) : 1306 - 1317
[45] FIREcaller: Detecting frequently interacting regions from Hi-C data
Crowley, Cheynna
Yang, Yuchen
Qiu, Yunjiang
Hu, Benxia
Abnousi, Armen
Lipinski, Jakub
Plewczynski, Dariusz
Wu, Di
Won, Hyejung
Ren, Bing
Hu, Ming
Li, Yun
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 355 - 362
[46] Unsupervised Learning from Noisy Networks with Applications to Hi-C Data
Wang, Bo
Zhu, Junjie
Ursu, Oana
Pourshafeie, Armin
Batzoglou, Serafim
Kundaje, Anshul
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[47] TADfit is a multivariate linear regression model for profiling hierarchical chromatin domains on replicate Hi-C data
Erhu Liu
Hongqiang Lyu
Qinke Peng
Yuan Liu
Tian Wang
Jiuqiang Han
Communications Biology, 5
[48] TADfit is a multivariate linear regression model for profiling hierarchical chromatin domains on replicate Hi-C data
Liu, Erhu
Lyu, Hongqiang
Peng, Qinke
Liu, Yuan
Wang, Tian
Han, Jiuqiang
COMMUNICATIONS BIOLOGY, 2022, 5 (01)
[49] Inferring time series chromatin states for promoter-enhancer pairs based on Hi-C data
Miko, Henriette
Qiu, Yunjiang
Gaertner, Bjoern
Sander, Maike
Ohler, Uwe
BMC GENOMICS, 2021, 22 (01)
[50] Pgltools: a genomic arithmetic tool suite for manipulation of Hi-C peak and other chromatin interaction data
William W. Greenwald
He Li
Erin N. Smith
Paola Benaglio
Naoki Nariai
Kelly A. Frazer
BMC Bioinformatics, 18

← 1 2 3 4 5 →