Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data

被引:0
作者
Li, Xiaoting [1 ]
Melo, Lucas A. N. [1 ]
Bussemaker, Harmen J. [1 ,2 ]
机构
[1] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA
[2] Columbia Univ, Dept Syst Biol, New York, NY 10032 USA
来源
GENOME BIOLOGY | 2024年 / 25卷 / 01期
关键词
Gene expression regulation; Non-coding variants; Transcription factors; Allele-specific binding; ChIP-seq; CTCF; Motif discovery; Biophysically interpretable machine learning; Statistical modeling; ChIP-exo; CUT&Tag; EBF1; PU.1/SPI1; SEQUENCE VARIATION; FACTOR OCCUPANCY; DISEASE; COMMON;
D O I
10.1186/s13059-024-03424-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. Results We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Conclusion Our work provides new strategies for predicting the functional impact of non-coding variants.
引用
收藏
页数:15
相关论文
共 44 条
  • [31] Discovering transcription factor regulatory targets using gene expression and binding data
    Maienschein-Cline, Mark
    Zhou, Jie
    White, Kevin P.
    Sciammas, Roger
    Dinner, Aaron R.
    BIOINFORMATICS, 2012, 28 (02) : 206 - 213
  • [32] Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
    Levitsky, Victor G.
    Kulakovskiy, Ivan V.
    Ershov, Nikita I.
    Oshchepkov, Dmitry Yu
    Makeev, Vsevolod J.
    Hodgman, T. C.
    Merkulova, Tatyana I.
    BMC GENOMICS, 2014, 15
  • [33] On the detection and refinement of transcription factor binding sites using ChIP-Seq data
    Hu, Ming
    Yu, Jindan
    Taylor, Jeremy M. G.
    Chinnaiyan, Arul M.
    Qin, Zhaohui S.
    NUCLEIC ACIDS RESEARCH, 2010, 38 (07) : 2154 - 2167
  • [34] Direct Evidence of Allele-Specific Binding of CTCF and MeCP2 to Tsix in a HPRT-Deficient Female F1 Hybrid Mouse Cell Line
    Son, J.
    Min, N. Y.
    Choi, J. -H.
    Ko, Y. J.
    Liang, W.
    Rhee, S.
    Lee, K. -H.
    CYTOGENETIC AND GENOME RESEARCH, 2012, 138 (01) : 11 - 18
  • [35] Comprehensive genome-wide transcription factor analysis reveals that a combination of high affinity and low affinity DNA binding is needed for human gene regulation
    Wang, Junbai
    Malecka, Agnieszka
    Troen, Gunhild
    Delabie, Jan
    BMC GENOMICS, 2015, 16
  • [36] Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis
    Tsukanov, Anton V.
    Mironova, Victoria V.
    Levitsky, Victor G.
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [37] Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
    Victor G Levitsky
    Ivan V Kulakovskiy
    Nikita I Ershov
    Dmitry Yu Oshchepkov
    Vsevolod J Makeev
    T C Hodgman
    Tatyana I Merkulova
    BMC Genomics, 15
  • [38] Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data
    Zheng, Wei
    Zhao, Hongyu
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2013, 12 (01) : 1 - 15
  • [39] Identification of Lineage-Specific Cis-Regulatory Modules Associated with Variation in Transcription Factor Binding and Chromatin Activity Using Ornstein-Uhlenbeck Models
    Naval-Sanchez, Marina
    Potier, Delphine
    Hulselmans, Gert
    Christiaens, Valerie
    Aerts, Stein
    MOLECULAR BIOLOGY AND EVOLUTION, 2015, 32 (09) : 2441 - 2455
  • [40] Comparative analysis of models in predicting the effects of SNPs on TF-DNA binding using large-scale in vitro and in vivo data
    Han, Dongmei
    Li, Yurun
    Wang, Linxiao
    Liang, Xuan
    Miao, Yuanyuan
    Li, Wenran
    Wang, Sijia
    Wang, Zhen
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (02)