MaxHiC: A robust background correction model to identify biologically relevant chromatin interactions in Hi-C and capture Hi-C experiments

被引:8
作者
Alinejad-Rokny, Hamid [1 ,2 ,3 ,4 ]
Modegh, Rassa Ghavami [5 ]
Rabiee, Hamid R. R. [5 ]
Sarbandi, Ehsan Ramezani
Rezaie, Narges [6 ]
Tam, Kin Tung [1 ,2 ]
Forrest, Alistair R. R. [1 ,2 ]
机构
[1] Univ Western Australia, Harry Perkins Inst Med Res, QEII Med Ctr, Perth, Australia
[2] Univ Western Australia, Ctr Med Res, Perth, Australia
[3] UNSW Sydney, Grad Sch Biomed Engn, Bio Med Machine Learning Lab BML, Sydney, Australia
[4] Macquarie Univ, Alenabled Proc AIP Res Ctr, Hlth Data Analyt Program, Sydney, Australia
[5] Sharif Univ Technol, Dept Comp Engn, Bioinformat & Computat Biol Lab, Tehran, Iran
[6] Univ Calif Irvine, Ctr Complex Biol Syst, Irvine, CA USA
基金
澳大利亚研究理事会; 英国医学研究理事会;
关键词
EXPRESSION; REVEALS; ORGANIZATION; ANNOTATION; PRINCIPLES;
D O I
10.1371/journal.pcbi.1010241
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summaryMaxHiC is a robust machine learning based tool for identifying significant interacting regions from both Hi-C and capture Hi-C data. All the current existing models are designed for either Hi-C or capture Hi-C data, however we developed MaxHiC to be applicable for both Hi-C and capture Hi-C libraries (two different models have been used for Hi-C and capture Hi-C data). MaxHiC is also able to analyse very deep Hi-C libraries (e.g., Micro-C) without any computational issues. MaxHiC significantly outperforms current existing Hi-C significant interaction callers and even Hi-C loop callers in terms of enrichment of interactions between known regulatory regions as well as biologically relevant interactions. Hi-C is a genome-wide chromosome conformation capture technology that detects interactions between pairs of genomic regions and exploits higher order chromatin structures. Conceptually Hi-C data counts interaction frequencies between every position in the genome and every other position. Biologically functional interactions are expected to occur more frequently than transient background and artefactual interactions. To identify biologically relevant interactions, several background models that take biases such as distance, GC content and mappability into account have been proposed. Here we introduce MaxHiC, a background correction tool that deals with these complex biases and robustly identifies statistically significant interactions in both Hi-C and capture Hi-C experiments. MaxHiC uses a negative binomial distribution model and a maximum likelihood technique to correct biases in both Hi-C and capture Hi-C libraries. We systematically benchmark MaxHiC against major Hi-C background correction tools including Hi-C significant interaction callers (SIC) and Hi-C loop callers using published Hi-C, capture Hi-C, and Micro-C datasets. Our results demonstrate that 1) Interacting regions identified by MaxHiC have significantly greater levels of overlap with known regulatory features (e.g. active chromatin histone marks, CTCF binding sites, DNase sensitivity) and also disease-associated genome-wide association SNPs than those identified by currently existing models, 2) the pairs of interacting regions are more likely to be linked by eQTL pairs and 3) more likely to link known regulatory features including known functional enhancer-promoter pairs validated by CRISPRi than any of the existing methods. We also demonstrate that interactions between different genomic region types have distinct distance distributions only revealed by MaxHiC. MaxHiC is publicly available as a python package for the analysis of Hi-C, capture Hi-C and Micro-C data.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] CHROMSTRUCT 4: A Python']Python Code to Estimate the Chromatin Structure from Hi-C Data
    Caudai, Claudia
    Salerno, Emanuele
    Zoppe, Monica
    Merelli, Ivan
    Tonazzini, Anna
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (06) : 1867 - 1878
  • [42] Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages
    Lan, Xun
    Witt, Heather
    Katsumura, Koichi
    Ye, Zhenqing
    Wang, Qianben
    Bresnick, Emery H.
    Farnham, Peggy J.
    Jin, Victor X.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (16) : 7690 - 7704
  • [43] A unified framework for inferring the multi-scale organization of chromatin domains from Hi-C
    Bak, Ji Hyun
    Kim, Min Hyeok
    Liu, Lei
    Hyeon, Changbong
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (03)
  • [44] BART3D: inferring transcriptional regulators associated with differential chromatin interactions from Hi-C data
    Wang, Zhenjia
    Zhang, Yifan
    Zang, Chongzhi
    BIOINFORMATICS, 2021, 37 (18) : 3075 - 3078
  • [45] SRHiC: A Deep Learning Model to Enhance the Resolution of Hi-C Data
    Li, Zhilan
    Dai, Zhiming
    FRONTIERS IN GENETICS, 2020, 11
  • [46] Multiplex-GAM: genome-wide identification of chromatin contacts yields insights overlooked by Hi-C
    Beagrie, Robert A. A.
    Thieme, Christoph J. J.
    Annunziatella, Carlo
    Baugher, Catherine
    Zhang, Yingnan
    Schueler, Markus
    Kukalev, Alexander
    Kempfer, Rieke
    Chiariello, Andrea M. M.
    Bianco, Simona
    Li, Yichao
    Davis, Trenton
    Scialdone, Antonio
    Welch, Lonnie R. R.
    Nicodemi, Mario
    Pombo, Ana
    NATURE METHODS, 2023, 20 (07) : 1037 - +
  • [47] SnapHiC-D: a computational pipeline to identify differential chromatin contacts from single-cell Hi-C data
    Lee, Lindsay
    Yu, Miao
    Li, Xiaoqi
    Zhu, Chenxu
    Zhang, Yanxiao
    Yu, Hongyu
    Chen, Ziyin
    Mishra, Shreya
    Ren, Bing
    Li, Yun
    Hu, Ming
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)
  • [48] diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data
    Lun, Aaron T. L.
    Smyth, Gordon K.
    BMC BIOINFORMATICS, 2015, 16
  • [49] PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SIC Avar gene family
    Lapp, S. A.
    Geraldo, J. A.
    Chien, J. -T.
    Ay, F.
    Pakala, S. B.
    Batugedara, G.
    Humphrey, J.
    Debarry, J. D.
    Le Roch, K. G.
    Galinski, M. R.
    Kissinger, J. C.
    PARASITOLOGY, 2018, 145 (01) : 71 - 84
  • [50] Hi-C Chromatin Interaction Networks Predict Co-expression in the Mouse Cortex
    Babaei, Sepideh
    Mahfouz, Ahmed
    Hulsman, Marc
    Lelieveldt, Boudewijn P. F.
    de Ridder, Jeroen
    Reinders, Marcel
    PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (05)