Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

被引:117
|
作者
Kaul, Arya [1 ,4 ]
Bhattacharyya, Sourya [2 ]
Ay, Ferhat [2 ,3 ]
机构
[1] Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
[2] La Jolla Inst Immunol, Div Vaccine Discovery, La Jolla, CA 92037 USA
[3] Univ Calif San Diego, Sch Med, La Jolla, CA 92093 USA
[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
关键词
REVEALS; GENOME; ORGANIZATION; PRINCIPLES; MODEL; MAP;
D O I
10.1038/s41596-019-0273-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes 12 h with preprocessing when all use cases are run sequentially (4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index. Fit-Hi-C is a computational tool for identifying statistically significant contacts from Hi-C data. This protocol describes how to apply the new version, called FitHiC2, on high-resolution Hi-C data, demonstrating the added functionalities.
引用
收藏
页码:991 / 1012
页数:22
相关论文
共 50 条
  • [41] HiC-ACT: improved detection of chromatin interactions from Hi-C data via aggregated Cauchy test
    Lagler, Taylor M.
    Abnousi, Armen
    Hu, Ming
    Yang, Yuchen
    Li, Yun
    AMERICAN JOURNAL OF HUMAN GENETICS, 2021, 108 (02) : 257 - 268
  • [42] Inferring Radial Organization of Chromosomal Territories from HI-C Data
    Das, Priyojit
    Sanders, Jacob T.
    Shen, Tongye
    McCord, Rachel P.
    BIOPHYSICAL JOURNAL, 2020, 118 (03) : 549A - 549A
  • [43] Graph-Based Genome Inference from Hi-C Data
    Shen, Yihang
    Yu, Lingge
    Qiu, Yutong
    Zhang, Tianyu
    Kingsford, Carl
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 115 - 130
  • [44] Translocation detection from Hi-C data via scan statistics
    Cheng, Anthony
    Mao, Disheng
    Zhang, Yuping
    Glaz, Joseph
    Ouyang, Zhengqing
    BIOMETRICS, 2023, 79 (02) : 1306 - 1317
  • [45] FIREcaller: Detecting frequently interacting regions from Hi-C data
    Crowley, Cheynna
    Yang, Yuchen
    Qiu, Yunjiang
    Hu, Benxia
    Abnousi, Armen
    Lipinski, Jakub
    Plewczynski, Dariusz
    Wu, Di
    Won, Hyejung
    Ren, Bing
    Hu, Ming
    Li, Yun
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 355 - 362
  • [46] Unsupervised Learning from Noisy Networks with Applications to Hi-C Data
    Wang, Bo
    Zhu, Junjie
    Ursu, Oana
    Pourshafeie, Armin
    Batzoglou, Serafim
    Kundaje, Anshul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [47] TADfit is a multivariate linear regression model for profiling hierarchical chromatin domains on replicate Hi-C data
    Erhu Liu
    Hongqiang Lyu
    Qinke Peng
    Yuan Liu
    Tian Wang
    Jiuqiang Han
    Communications Biology, 5
  • [48] TADfit is a multivariate linear regression model for profiling hierarchical chromatin domains on replicate Hi-C data
    Liu, Erhu
    Lyu, Hongqiang
    Peng, Qinke
    Liu, Yuan
    Wang, Tian
    Han, Jiuqiang
    COMMUNICATIONS BIOLOGY, 2022, 5 (01)
  • [49] Inferring time series chromatin states for promoter-enhancer pairs based on Hi-C data
    Miko, Henriette
    Qiu, Yunjiang
    Gaertner, Bjoern
    Sander, Maike
    Ohler, Uwe
    BMC GENOMICS, 2021, 22 (01)
  • [50] Pgltools: a genomic arithmetic tool suite for manipulation of Hi-C peak and other chromatin interaction data
    William W. Greenwald
    He Li
    Erin N. Smith
    Paola Benaglio
    Naoki Nariai
    Kelly A. Frazer
    BMC Bioinformatics, 18