HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient

被引:281
作者
Yang, Tao [1 ]
Zhang, Feipeng [2 ]
Yardimci, Galip Gurkan [3 ]
Song, Fan [1 ]
Hardison, Ross C. [1 ,4 ]
Noble, William Stafford [3 ,5 ]
Yue, Feng [1 ,6 ]
Li, Qunhua [1 ,2 ]
机构
[1] Penn State Univ, Bioinformat & Genom Program, University Pk, PA 16802 USA
[2] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98105 USA
[4] Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[5] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98105 USA
[6] Penn State Univ, Coll Med, Dept Biochem & Mol Biol, Hershey, PA 17033 USA
基金
美国国家卫生研究院;
关键词
HUMAN GENOME; CHROMOSOME CONFORMATION; HIGH-RESOLUTION; HUMAN-CELLS; CHROMATIN; ORGANIZATION; ARCHITECTURE; DOMAINS; DIFFERENTIATION; PRINCIPLES;
D O I
10.1101/gr.220640.117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.
引用
收藏
页码:1939 / 1949
页数:11
相关论文
共 43 条
  • [1] [Anonymous], 2012, CATEGORICAL DATA ANA
  • [2] Archer E, 2016, RFPERMUTE ESTIMATE
  • [3] Analysis methods for studying the 3D architecture of the genome
    Ay, Ferhat
    Noble, William S.
    [J]. GENOME BIOLOGY, 2015, 16
  • [4] Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts
    Ay, Ferhat
    Bailey, Timothy L.
    Noble, William Stafford
    [J]. GENOME RESEARCH, 2014, 24 (06) : 999 - 1011
  • [5] The Spatial Organization of the Human Genome
    Bickmore, Wendy A.
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 14, 2013, 14 : 67 - 84
  • [6] Casella G., 2002, STAT INFERENCE
  • [7] Normalization of a chromosomal contact map
    Cournac, Axel
    Marie-Nelly, Herve
    Marbouty, Martial
    Koszul, Romain
    Mozziconacci, Julien
    [J]. BMC GENOMICS, 2012, 13
  • [8] Davies R., 2012, Computer and Machine Vision: Theory, Algorithms, Practicalities, V4th
  • [9] Capturing chromosome conformation
    Dekker, J
    Rippe, K
    Dekker, M
    Kleckner, N
    [J]. SCIENCE, 2002, 295 (5558) : 1306 - 1311
  • [10] Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data
    Dekker, Job
    Marti-Renom, Marc A.
    Mirny, Leonid A.
    [J]. NATURE REVIEWS GENETICS, 2013, 14 (06) : 390 - 403