Predicting transcription factor binding sites by a multi-modal representation learning method based on cross-attention network

被引:1
作者
Wei, Yuxiao [1 ]
Zhang, Qi [2 ]
Liu, Liwei [2 ]
机构
[1] Dalian Jiaotong Univ, Coll Software, Dalian 116028, Peoples R China
[2] Dalian Jiaotong Univ, Coll Sci, Dalian 116028, Peoples R China
关键词
Transcription factor binding sites; Deep learning; Cross-attention mechanism; Model interpretability; CHIP-SEQ; DNA; SPECIFICITIES;
D O I
10.1016/j.asoc.2024.112134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prediction of transcription factor binding sites (TFBS) plays a crucial role in studying cellular functions and understanding transcriptional regulatory processes. With the development of chromatin immunoprecipitation sequencing (ChIP-seq) technology, an increasing number of computer-aided TFBS prediction models have emerged. However, how to integrate multi-modal information of DNA and obtain efficient features to improve prediction accuracy remains a major challenge. Here, we propose MultiTF, a multi-modal representation learning method based on a cross-attention network for predicting transcription factor binding sites. Among TFBS prediction methods, we are the first to use graph neural networks and cross-attention networks for representation learning. MultiTF uses dna2vec to extract global contextual features of DNA sequences, DNAshapeR to extract shape features, and the CDPfold model and graph attention network for learning and representation of DNA structural features. Finally, with the help of our cross-attention module, we successfully combine sequence, structural, and shape features to achieve interactive fusion. When comparing MultiTF to other state-of-the-art methods using 165 ENCODE ChIP-seq datasets, we find that MultiTF exhibits average ACC, ROC-AUC, and PR-AUC values of 0.911, 0.978, and 0.982, respectively. The results show that MultiTF achieves unprecedented prediction accuracy compared to previous TFBS prediction models. In addition, our visual analysis of structural features provides interpretability for the prediction results.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] Feature Selection for Hidden Markov Models and Hidden Semi-Markov Models
    Adams, Stephen
    Beling, Peter A.
    Cogill, Randy
    [J]. IEEE ACCESS, 2016, 4 : 1642 - 1657
  • [2] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [3] Modeling binding specificities of transcription factor pairs with random forests
    Antikainen, Anni A.
    Heinonen, Markus
    Lahdesmaki, Harri
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [4] MEME SUITE: tools for motif discovery and searching
    Bailey, Timothy L.
    Boden, Mikael
    Buske, Fabian A.
    Frith, Martin
    Grant, Charles E.
    Clementi, Luca
    Ren, Jingyuan
    Li, Wilfred W.
    Noble, William S.
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : W202 - W208
  • [5] BAJAO N., 2023, Mesop. J. Comput. Sci., V2023, P75
  • [6] Dynamic Random Forests
    Bernard, Simon
    Adam, Sebastien
    Heutte, Laurent
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (12) : 1580 - 1586
  • [7] DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks
    Chen, Chen
    Hou, Jie
    Shi, Xiaowen
    Yang, Hua
    Birchler, James A.
    Cheng, Jianlin
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [8] Adam and the Ants: On the Influence of the Optimization Algorithm on the Detectability of DNN Watermarks
    Cortinas-Lorenzo, Betty
    Perez-Gonzalez, Fernando
    [J]. ENTROPY, 2020, 22 (12) : 1 - 39
  • [9] DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape
    Ding, Pengju
    Wang, Yifei
    Zhang, Xinyu
    Gao, Xin
    Liu, Guozhu
    Yu, Bin
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [10] Deep multi-scale attention network for RNA-binding proteins prediction
    Du, Bo
    Liu, Ziyi
    Luo, Fulin
    [J]. INFORMATION SCIENCES, 2022, 582 : 287 - 301