HybProm: An attention-assisted hybrid CNN-BiLSTM model for the interpretable prediction of DNA promoter

被引:0
作者
Luo, Rentao [1 ]
Liu, Jiawei [1 ]
Guan, Lixin [1 ]
Li, Mengshan [1 ]
机构
[1] Gannan Normal Univ, Coll Phys & Elect Informat, Ganzhou 341000, Peoples R China
基金
中国国家自然科学基金;
关键词
Promoter; Deep learning; Attention; Gene sequences; Bioinformatics; TRANSCRIPTION START SITES; TATA-BOX; DATABASE; DOWNSTREAM; INITIATOR; ELEMENTS; REGIONS; GENES;
D O I
10.1016/j.ymeth.2025.02.001
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Promoter prediction is essential for analyzing gene structures, understanding regulatory networks, transcription mechanisms, and precisely controlling gene expression. Recently, computational and deep learning methods for promoter prediction have gained attention. However, there is still room to improve their accuracy. To address this, we propose the HybProm model, which uses DNA2Vec to transform DNA sequences into low-dimensional vectors, followed by a CNN-BiLSTM-Attention architecture to extract features and predict promoters across species, including E. coli, humans, mice, and plants. Experiments show that HybProm consistently achieves high accuracy (90%-99%) and offers good interpretability by identifying key sequence patterns and positions that drive predictions.
引用
收藏
页码:71 / 80
页数:10
相关论文
共 64 条
[1]  
Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[2]   What is next generation sequencing? [J].
Behjati, Sam ;
Tarpey, Patrick S. .
ARCHIVES OF DISEASE IN CHILDHOOD-EDUCATION AND PRACTICE EDITION, 2013, 98 (06) :236-238
[3]   PromoterPredict: sequence-based modelling of Escherichia coli σ70 promoter strength yields logarithmic dependence between promoter strength and sequence [J].
Bharanikumar, Ramit ;
Premkumar, Keshav Aditya R. ;
Palaniappan, Ashok .
PEERJ, 2018, 6
[4]   An overview of ensembl [J].
Birney, E ;
Andrews, TD ;
Bevan, P ;
Caccamo, M ;
Chen, Y ;
Clarke, L ;
Coates, G ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Down, T ;
Eyras, E ;
Fernandez-Suarez, XM ;
Gane, P ;
Gibbins, B ;
Gilbert, J ;
Hammond, M ;
Hotz, HR ;
Iyer, V ;
Jekosch, K ;
Kahari, A ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Lehvaslaiho, H ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Proctor, G ;
Rae, M ;
Searle, S ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Ureta-Vidal, A ;
Woodwark, KC ;
Cameron, G ;
Durbin, R ;
Cox, A ;
Hubbard, T ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :925-928
[5]   A rapid micro chromatin immunoprecipitation assay (μChIP) [J].
Dahl, John Arne ;
Collas, Philippe .
NATURE PROTOCOLS, 2008, 3 (06) :1032-1045
[6]   A core promoter element downstream of the TATA box that is recognized by TFIIB [J].
Deng, WS ;
Roberts, SGE .
GENES & DEVELOPMENT, 2005, 19 (20) :2418-2423
[7]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[8]  
Dey R, 2017, Arxiv, DOI [arXiv:1701.05923, DOI 10.48550/ARXIV.1701.05923]
[9]   Computational detection and location of transcription start sites in mammalian genomic DNA [J].
Down, TA ;
Hubbard, TJP .
GENOME RESEARCH, 2002, 12 (03) :458-461
[10]   The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools [J].
Dreos, Ren ;
Ambrosini, Giovanna ;
Perier, Rouayda Cavin ;
Bucher, Philipp .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D92-D96