DeepChrom: A Diffusion-Based Framework for Long-Tailed Chromatin State Prediction

被引:1
作者
Liu, Yuhang [1 ]
Wang, Zixuan [2 ]
Lv, Jiaheng [1 ]
Zhang, Yongqing [1 ]
机构
[1] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu 610225, Peoples R China
[2] Sichuan Univ, Coll Elect & Informat Engn, Chengdu 610065, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III | 2024年 / 14427卷
基金
中国国家自然科学基金;
关键词
Chromatin state; Diffusion model; Long-tailed learning; Bioinformatics;
D O I
10.1007/978-981-99-8435-0_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chromatin state reflects distinct biological roles of the genome that can systematically characterize regulatory elements and their functional interaction. Despite extensive computational studies, accurate prediction of chromatin state remains a challenge because of the long-tailed class imbalance. Here, we propose a deep-learning framework, DeepChrom, to predict long-tailed chromatin state directly from DNA sequence. The framework includes a diffusion-based model that balances the samples of different classes by generating pseudo-samples and a novel dilated CNN-based model for chromatin state prediction. On top of that, we further develop a novel equalization loss to increase the penalty on generated samples, which alleviates the impact of the bias between ground truth and generated samples. DeepChrom achieves outstanding performance on nine human cell types with our designed paradigm. Specifically, our proposed long-tailed learning strategy surpasses the traditional training method by 0.056 in Acc. To our knowledge, DeepChrom is pioneering in predicting long-tailed chromatin states by the diffusion-based model to achieve sample balance.
引用
收藏
页码:188 / 199
页数:12
相关论文
共 26 条
  • [1] A Machine Learning Method for Differentiating and Predicting Human-Infective Coronavirus Based on Physicochemical Features and Composition of the Spike Protein
    Chao, Wang
    Quan, Zou
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2021, 30 (05) : 815 - 823
  • [2] A sequence-based global map of regulatory activity for deciphering human genetics
    Chen, Kathleen M.
    Wong, Aaron K.
    Troyanskaya, Olga G.
    Zhou, Jian
    [J]. NATURE GENETICS, 2022, 54 (07) : 940 - +
  • [3] Dhariwal P, 2021, ADV NEUR IN, V34
  • [4] Mechanical regulation of chromatin and transcription
    Dupont, Sirio
    Wickstrom, Sara A.
    [J]. NATURE REVIEWS GENETICS, 2022, 23 (10) : 624 - 643
  • [5] Mapping and analysis of chromatin state dynamics in nine human cell types
    Ernst, Jason
    Kheradpour, Pouya
    Mikkelsen, Tarjei S.
    Shoresh, Noam
    Ward, Lucas D.
    Epstein, Charles B.
    Zhang, Xiaolan
    Wang, Li
    Issner, Robbyn
    Coyne, Michael
    Ku, Manching
    Durham, Timothy
    Kellis, Manolis
    Bernstein, Bradley E.
    [J]. NATURE, 2011, 473 (7345) : 43 - U52
  • [6] Discovery and characterization of chromatin states for systematic annotation of the human genome
    Ernst, Jason
    Kellis, Manolis
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (08) : 817 - U94
  • [7] Ho J., 2020, Denoising diffusion probabilistic models, V33, P6840
  • [8] Kang B., 2019, arXiv
  • [9] Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
    Kelley, David R.
    Snoek, Jasper
    Rinn, John L.
    [J]. GENOME RESEARCH, 2016, 26 (07) : 990 - 999
  • [10] Kingma DP, 2018, ADV NEUR IN, V31