Sequence-Only Prediction of Super-Enhancers in Human Cell Lines Using Transformer Models

被引:1
作者
Kravchuk, Ekaterina V. [1 ]
Ashniev, German A. [1 ,2 ,3 ]
Gladkova, Marina G. [1 ,4 ]
Orlov, Alexey V. [1 ]
Zaitseva, Zoia G. [1 ]
Malkerov, Juri A. [1 ]
Orlova, Natalia N. [1 ]
机构
[1] Russian Acad Sci, Prokhorov Gen Phys Inst, 38 Vavilov St, Moscow 119991, Russia
[2] Lomonosov Moscow State Univ, Fac Biol, MSU, Leninskiye Gory 1-12, Moscow 119991, Russia
[3] Inst Informat Transmiss Problems RAS, Moscow 127051, Russia
[4] Lomonosov Moscow State Univ, Fac Bioengn & Bioinformat, MSU, GSP-1,Leninskiye Gory 1-73, Moscow 119234, Russia
来源
BIOLOGY-BASEL | 2025年 / 14卷 / 02期
基金
俄罗斯科学基金会;
关键词
bioinformatics; super-enhancers; neural networks; transformers; BigBird; BERT; IDENTITY; GENE; PROGNOSIS; DATABASE; MEGF10; MOUSE;
D O I
10.3390/biology14020172
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The study discloses the application of transformer-based deep learning models for the task of super-enhancers prediction in human tumor cell lines with a specific focus on sequence-only features within studied entities of super-enhancer and enhancer elements in the human genome. The proposed SE-prediction method included the GENA-LM application at handling long DNA sequences with the classification task, distinguishing super-enhancers from enhancers using H3K36me, H3K4me1, H3K4me3 and H3K27ac landscape datasets from HeLa, HEK293, H2171, Jurkat, K562, MM1S and U87 cell lines. The model was fine-tuned on relevant sequence data, allowing for the analysis of extended genomic sequences without the need for epigenetic markers as proposed in early approaches. The study achieved balanced accuracy metrics, surpassing previous models like SENet, particularly in HEK293 and K562 cell lines. Also, it was shown that super-enhancers frequently co-localize with epigenetic marks such as H3K4me3 and H3K27ac. Therefore, the attention mechanism of the model provided insights into the sequence features contributing to SE classification, indicating a correlation between sequence-only features and mentioned epigenetic landscapes. These findings support the potential transformer models use in further genomic sequence analysis for bioinformatics applications in enhancer/super-enhancer characterization and gene regulation studies.
引用
收藏
页数:20
相关论文
共 59 条
[1]   Piezo Proteins: Regulators of Mechanosensation and Other Cellular Processes [J].
Bagriantsev, Sviatoslav N. ;
Gracheva, Elena O. ;
Gallagher, Patrick G. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2014, 289 (46) :31673-31681
[2]   The MEME Suite [J].
Bailey, Timothy L. ;
Johnson, James ;
Grant, Charles E. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) :W39-W49
[3]   The dynamic broad epigenetic (H3K4me3, H3K27ac) domain as a mark of essential genes [J].
Beacon, Tasnim H. ;
Delcuve, Genevieve P. ;
Lopez, Camila ;
Nardocci, Gino ;
Kovalchuk, Igor ;
van Wijnen, Andre J. ;
Davie, James R. .
CLINICAL EPIGENETICS, 2021, 13 (01)
[4]   H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency [J].
Benayoun, Berenice A. ;
Pollina, Elizabeth A. ;
Ucar, Duygu ;
Mahmoudi, Salah ;
Karra, Kalpana ;
Wong, Edith D. ;
Devarajan, Keerthana ;
Daugherty, Aaron C. ;
Kundaje, Anshul B. ;
Mancini, Elena ;
Hitz, Benjamin C. ;
Gupta, Rakhi ;
Rando, Thomas A. ;
Baker, Julie C. ;
Snyder, Michael P. ;
Cherry, J. Michael ;
Brunet, Anne .
CELL, 2014, 158 (03) :673-688
[5]   Dynamics of enhancer chromatin signatures mark the transition from pluripotency to cell specification during embryogenesis [J].
Bogdanovic, Ozren ;
Fernandez-Minan, Ana ;
Tena, Juan J. ;
de la Calle-Mustienes, Elisa ;
Hidalgo, Carmen ;
van Kruysbergen, Ila ;
van Heeringen, Simon J. ;
Veenstra, Gert Jan C. ;
Luis Gomez-Skarmeta, Jose .
GENOME RESEARCH, 2012, 22 (10) :2043-2053
[6]  
Cawley N.X., 2014, Reference Module in Biomedical Sciences
[7]   Genome-Wide DNA Methylation Analysis Identifies MEGF10 as a Novel Epigenetically Repressed Candidate Tumor Suppressor Gene in Neuroblastoma [J].
Charlet, Jessica ;
Tomari, Ayumi ;
Dallosso, Anthony R. ;
Szemes, Marianna ;
Kaselova, Martina ;
Curry, Thomas J. ;
Almutairi, Bader ;
Etchevers, Heather C. ;
McConville, Carmel ;
Malik, Karim T. A. ;
Brown, Keith W. .
MOLECULAR CARCINOGENESIS, 2017, 56 (04) :1290-1301
[8]   DiGeorge-like syndrome in a child with a 3p12.3 deletion involving MIR4273 gene born to a mother with gestational diabetes mellitus [J].
Cirillo, Emilia ;
Giardino, Giuliana ;
Gallo, Vera ;
Galasso, Giovanni ;
Romano, Roberta ;
D'Assante, Roberta ;
Scalia, Giulia ;
Del Vecchio, Luigi ;
Nitsch, Lucio ;
Genesio, Rita ;
Pignata, Claudio .
AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2017, 173 (07) :1913-1918
[9]   A lightweight transformer for faster and robust EBSD data collection [J].
Dong, Harry ;
Donegan, Sean ;
Shah, Megna ;
Chi, Yuejie .
SCIENTIFIC REPORTS, 2023, 13 (01)
[10]  
Fishman V., 2024, bioRxiv, DOI [10.1093/nar/gkae1310, DOI 10.1093/NAR/GKAE1310]