STATISTICAL CURVE MODELS FOR INFERRING 3D CHROMATIN ARCHITECTURE

被引:0
作者
Uzhilina, Lena [1 ]
Astie, Trevor [2 ]
Segal, Mark [3 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Stanford Univ, Dept Stat, Stanford, CA USA
[3] Univ Calif Irvine, Dept Epidemiol & Biostat, Irvine, CA USA
基金
美国国家科学基金会; 加拿大自然科学与工程研究理事会; 美国国家卫生研究院;
关键词
Key words and phrases. Spatial structure; conformation reconstruction; metric scaling; splines; REVEALS; GENOME; PRINCIPLES; REGRESSION;
D O I
10.1214/24-AOAS1917
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Reconstructing three-dimensional (3D) chromatin structure from conformation capture assays (such as Hi-C) is a critical task in computational biology, since chromatin spatial architecture plays a vital role in numerous cellular processes and direct imaging is challenging. Most existing algorithms that operate on Hi-C contact matrices produce reconstructed 3D configurations in the form of a polygonal chain. However, none of the methods exploit the fact that the target solution is a (smooth) curve in 3D: this contiguity attribute is either ignored or indirectly addressed by imposing spatial constraints that are challenging to formulate. In this paper we develop both B-spline and smoothing spline techniques for directly capturing this potentially complex 1D curve. We subsequently combine these techniques with a Poisson model for contact counts and compare their performance on a real data example. In addition, motivated by the sparsity of Hi-C contact data, especially when obtained from single-cell assays, we appreciably extend the class of distributions used to model contact counts. We build a general distribution-based metric scaling (DBMS) framework from which we develop zero-inflated and Hurdle Poisson models as well as negative binomial applications. Illustrative applications make recourse to bulk Hi-C data from IMR90 cells and singlecell Hi-C data from mouse embryonic stem cells.
引用
收藏
页码:2979 / 3006
页数:28
相关论文
共 35 条
[1]   Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression [J].
Ay, Ferhat ;
Bunnik, Evelien M. ;
Varoquaux, Nelle ;
Bol, Sebastiaan M. ;
Prudhomme, Jacques ;
Vert, Jean-Philippe ;
Noble, William Stafford ;
Le Roch, Karine G. .
GENOME RESEARCH, 2014, 24 (06) :974-988
[2]   Identifying 3D Genome Organization in Diploid Organisms via Euclidean Distance Geometry [J].
Belyaeva, Anastasiya ;
Kubjas, Kaie ;
Sun, Lawrence J. ;
Uhler, Caroline .
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (01) :204-228
[3]   Block Power Method for SVD Decomposition [J].
Bentbib, A. H. ;
Kanber, A. .
ANALELE STIINTIFICE ALE UNIVERSITATII OVIDIUS CONSTANTA-SERIA MATEMATICA, 2015, 23 (02) :45-58
[4]   Discovering hotspots in functional genomic data superposed on 3D chromatin configuration reconstructions [J].
Capurso, Daniel ;
Bengtsson, Henrik ;
Segal, Mark R. .
NUCLEIC ACIDS RESEARCH, 2016, 44 (05) :2028-2035
[5]  
CAUER A. G., 2019, 19 INT WORKSH ALG BI, V143
[6]   A three-dimensional model of the yeast genome [J].
Duan, Zhijun ;
Andronescu, Mirela ;
Schutz, Kevin ;
McIlwain, Sean ;
Kim, Yoo Jung ;
Lee, Choli ;
Shendure, Jay ;
Fields, Stanley ;
Blau, C. Anthony ;
Noble, William S. .
NATURE, 2010, 465 (7296) :363-367
[7]  
GREEN P. J., 1994, MONOGRAPHS STAT APPL, V58
[8]   Generalized hurdle count data regression models [J].
Gurmu, S .
ECONOMICS LETTERS, 1998, 58 (03) :263-268
[9]   PRINCIPAL CURVES [J].
HASTIE, T ;
STUETZLE, W .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (406) :502-516
[10]  
Hastie T., 2001, The elements of statistical learning