Sequence-Only Prediction of Super-Enhancers in Human Cell Lines Using Transformer Models

被引:1
作者
Kravchuk, Ekaterina V. [1 ]
Ashniev, German A. [1 ,2 ,3 ]
Gladkova, Marina G. [1 ,4 ]
Orlov, Alexey V. [1 ]
Zaitseva, Zoia G. [1 ]
Malkerov, Juri A. [1 ]
Orlova, Natalia N. [1 ]
机构
[1] Russian Acad Sci, Prokhorov Gen Phys Inst, 38 Vavilov St, Moscow 119991, Russia
[2] Lomonosov Moscow State Univ, Fac Biol, MSU, Leninskiye Gory 1-12, Moscow 119991, Russia
[3] Inst Informat Transmiss Problems RAS, Moscow 127051, Russia
[4] Lomonosov Moscow State Univ, Fac Bioengn & Bioinformat, MSU, GSP-1,Leninskiye Gory 1-73, Moscow 119234, Russia
来源
BIOLOGY-BASEL | 2025年 / 14卷 / 02期
基金
俄罗斯科学基金会;
关键词
bioinformatics; super-enhancers; neural networks; transformers; BigBird; BERT; IDENTITY; GENE; PROGNOSIS; DATABASE; MEGF10; MOUSE;
D O I
10.3390/biology14020172
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study discloses the application of transformer-based deep learning models for the task of super-enhancers prediction in human tumor cell lines with a specific focus on sequence-only features within studied entities of super-enhancer and enhancer elements in the human genome. The proposed SE-prediction method included the GENA-LM application at handling long DNA sequences with the classification task, distinguishing super-enhancers from enhancers using H3K36me, H3K4me1, H3K4me3 and H3K27ac landscape datasets from HeLa, HEK293, H2171, Jurkat, K562, MM1S and U87 cell lines. The model was fine-tuned on relevant sequence data, allowing for the analysis of extended genomic sequences without the need for epigenetic markers as proposed in early approaches. The study achieved balanced accuracy metrics, surpassing previous models like SENet, particularly in HEK293 and K562 cell lines. Also, it was shown that super-enhancers frequently co-localize with epigenetic marks such as H3K4me3 and H3K27ac. Therefore, the attention mechanism of the model provided insights into the sequence features contributing to SE classification, indicating a correlation between sequence-only features and mentioned epigenetic landscapes. These findings support the potential transformer models use in further genomic sequence analysis for bioinformatics applications in enhancer/super-enhancer characterization and gene regulation studies.
引用
收藏
页数:20
相关论文
共 59 条
[51]   Human GGT2 Does Not Autocleave into a Functional Enzyme: A Cautionary Tale for Interpretation of Microarray Data on Redox Signaling [J].
West, Matthew B. ;
Wickham, Stephanie ;
Parks, Eileen E. ;
Sherry, David M. ;
Hanigan, Marie H. .
ANTIOXIDANTS & REDOX SIGNALING, 2013, 19 (16) :1877-1888
[52]   Master Transcription Factors and Mediator Establish Super-Enhancers at Key Cell Identity Genes [J].
Whyte, Warren A. ;
Orlando, David A. ;
Hnisz, Denes ;
Abraham, Brian J. ;
Lin, Charles Y. ;
Kagey, Michael H. ;
Rahl, Peter B. ;
Lee, Tong Ihn ;
Young, Richard A. .
CELL, 2013, 153 (02) :307-319
[53]   BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone [J].
Yang, Bite ;
Liu, Feng ;
Ren, Chao ;
Ouyang, Zhangyi ;
Xie, Ziwei ;
Bo, Xiaochen ;
Shu, Wenjie .
BIOINFORMATICS, 2017, 33 (13) :1930-1936
[54]  
Zaheer M., 2020, Advances in neural information processing systems, P17283
[55]  
Zhang F., 2022, bioRxiv, DOI DOI 10.1101/2022.02.25.481990
[56]   DDX39B contributes to the proliferation of colorectal cancer through direct binding to CDK6/CCND1 [J].
Zhang, Haonan ;
He, Chengcheng ;
Guo, Xuxue ;
Fang, Yuxin ;
Lai, Qiuhua ;
Wang, Xinke ;
Pan, Xingzhu ;
Li, Haolin ;
Qin, Kaiwen ;
Li, Aimin ;
Liu, Side ;
Li, Qingyuan .
CELL DEATH DISCOVERY, 2022, 8 (01)
[57]   H3K27ac nucleosomes facilitate HMGN localization at regulatory sites to modulate chromatin binding of transcription factors [J].
Zhang, Shaofei ;
Postnikov, Yuri ;
Lobanov, Alexei ;
Furusawa, Takashi ;
Deng, Tao ;
Bustin, Michael .
COMMUNICATIONS BIOLOGY, 2022, 5 (01)
[58]   The Value of H2BC12 for Predicting Poor Survival Outcomes in Patients With WHO Grade II and III Gliomas [J].
Zhou, Jie ;
Xing, Zhaoquan ;
Xiao, Yilei ;
Li, Mengyou ;
Li, Xin ;
Wang, Ding ;
Dong, Zhaogang .
FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
[59]   ChIP-Atlas 3.0: a data-mining suite to explore chromosome architecture together with large-scale regulome data [J].
Zou, Zhaonan ;
Ohta, Tazro ;
Oki, Shinya .
NUCLEIC ACIDS RESEARCH, 2024, 52 (W1) :W45-W53