Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data

被引:0
|
作者
Li, Xiaoting [1 ]
Melo, Lucas A. N. [1 ]
Bussemaker, Harmen J. [1 ,2 ]
机构
[1] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA
[2] Columbia Univ, Dept Syst Biol, New York, NY 10032 USA
来源
GENOME BIOLOGY | 2024年 / 25卷 / 01期
关键词
Gene expression regulation; Non-coding variants; Transcription factors; Allele-specific binding; ChIP-seq; CTCF; Motif discovery; Biophysically interpretable machine learning; Statistical modeling; ChIP-exo; CUT&Tag; EBF1; PU.1/SPI1; SEQUENCE VARIATION; FACTOR OCCUPANCY; DISEASE; COMMON;
D O I
10.1186/s13059-024-03424-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. Results We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Conclusion Our work provides new strategies for predicting the functional impact of non-coding variants.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Landscape of allele-specific transcription factor binding in the human genome
    Sergey Abramov
    Alexandr Boytsov
    Daria Bykova
    Dmitry D. Penzar
    Ivan Yevshin
    Semyon K. Kolmykov
    Marina V. Fridman
    Alexander V. Favorov
    Ilya E. Vorontsov
    Eugene Baulin
    Fedor Kolpakov
    Vsevolod J. Makeev
    Ivan V. Kulakovskiy
    Nature Communications, 12
  • [2] Landscape of allele-specific transcription factor binding in the human genome
    Abramov, Sergey
    Boytsov, Alexandr
    Bykova, Daria
    Penzar, Dmitry D.
    Yevshin, Ivan
    Kolmykov, Semyon K.
    Fridman, Marina, V
    Favorov, Alexander, V
    Vorontsov, Ilya E.
    Baulin, Eugene
    Kolpakov, Fedor
    Makeev, Vsevolod J.
    Kulakovskiy, Ivan, V
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [3] Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs
    Li, Ying
    Zhang, Xiao-Ou
    Liu, Yan
    Lu, Aiping
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [4] Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs
    Ying Li
    Xiao-Ou Zhang
    Yan Liu
    Aiping Lu
    BMC Bioinformatics, 24
  • [5] Allele-specific transcription factor binding in a cellular model of orofacial clefting
    Katharina L. M. Ruff
    Ronja Hollstein
    Julia Fazaal
    Frederic Thieme
    Jan Gehlen
    Elisabeth Mangold
    Michael Knapp
    Julia Welzenbach
    Kerstin U. Ludwig
    Scientific Reports, 12
  • [6] Allele-specific transcription factor binding in a cellular model of orofacial clefting
    Ruff, Katharina L. M.
    Hollstein, Ronja
    Fazaal, Julia
    Thieme, Frederic
    Gehlen, Jan
    Mangold, Elisabeth
    Knapp, Michael
    Welzenbach, Julia
    Ludwig, Kerstin U.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [7] BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes
    de Santiago, Ines
    Liu, Wei
    Yuan, Ke
    O'Reilly, Martin
    Chilamakuri, Chandra Sekhar Reddy
    Ponder, Bruce A. J.
    Meyer, Kerstin B.
    Markowetz, Florian
    GENOME BIOLOGY, 2017, 18
  • [8] ANANASTRA: annotation and enrichment analysis of allele-specific transcription factor binding at SNPs
    Boytsov, Alexandr
    Abramov, Sergey
    Aiusheeva, Ariuna Z.
    Kasianova, Alexandra M.
    Baulin, Eugene
    Kuznetsov, Ivan A.
    Aulchenko, Yurii S.
    Kolmykov, Semyon
    Yevshin, Ivan
    Kolpakov, Fedor
    Vorontsov, Ilya E.
    Makeev, Vsevolod J.
    Kulakovskiy, Ivan, V
    NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) : W51 - W56
  • [9] BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes
    Ines de Santiago
    Wei Liu
    Ke Yuan
    Martin O’Reilly
    Chandra Sekhar Reddy Chilamakuri
    Bruce A. J. Ponder
    Kerstin B. Meyer
    Florian Markowetz
    Genome Biology, 18
  • [10] Allele-specific analysis of transcription factors binding to promoter regions
    Heckman, CA
    Boxer, LM
    METHODS, 2002, 26 (01) : 19 - 26