HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient

被引:307
作者
Yang, Tao [1 ]
Zhang, Feipeng [2 ]
Yardimci, Galip Gurkan [3 ]
Song, Fan [1 ]
Hardison, Ross C. [1 ,4 ]
Noble, William Stafford [3 ,5 ]
Yue, Feng [1 ,6 ]
Li, Qunhua [1 ,2 ]
机构
[1] Penn State Univ, Bioinformat & Genom Program, University Pk, PA 16802 USA
[2] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98105 USA
[4] Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[5] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98105 USA
[6] Penn State Univ, Coll Med, Dept Biochem & Mol Biol, Hershey, PA 17033 USA
基金
美国国家卫生研究院;
关键词
HUMAN GENOME; CHROMOSOME CONFORMATION; HIGH-RESOLUTION; HUMAN-CELLS; CHROMATIN; ORGANIZATION; ARCHITECTURE; DOMAINS; DIFFERENTIATION; PRINCIPLES;
D O I
10.1101/gr.220640.117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.
引用
收藏
页码:1939 / 1949
页数:11
相关论文
共 43 条
[1]  
[Anonymous], 2012, CATEGORICAL DATA ANA
[2]  
Archer E, 2016, RFPERMUTE ESTIMATE
[3]   Analysis methods for studying the 3D architecture of the genome [J].
Ay, Ferhat ;
Noble, William S. .
GENOME BIOLOGY, 2015, 16
[4]   Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts [J].
Ay, Ferhat ;
Bailey, Timothy L. ;
Noble, William Stafford .
GENOME RESEARCH, 2014, 24 (06) :999-1011
[5]   The Spatial Organization of the Human Genome [J].
Bickmore, Wendy A. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 14, 2013, 14 :67-84
[6]  
Casella G., 2002, STAT INFERENCE
[7]   Normalization of a chromosomal contact map [J].
Cournac, Axel ;
Marie-Nelly, Herve ;
Marbouty, Martial ;
Koszul, Romain ;
Mozziconacci, Julien .
BMC GENOMICS, 2012, 13
[8]  
Davies R., 2012, Computer and Machine Vision: Theory, Algorithms, Practicalities, V4th
[9]   Capturing chromosome conformation [J].
Dekker, J ;
Rippe, K ;
Dekker, M ;
Kleckner, N .
SCIENCE, 2002, 295 (5558) :1306-1311
[10]   Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data [J].
Dekker, Job ;
Marti-Renom, Marc A. ;
Mirny, Leonid A. .
NATURE REVIEWS GENETICS, 2013, 14 (06) :390-403