Enhancing generalizability and performance in drug-target interaction identification by integrating pharmacophore and pre-trained models

被引:1
作者
Zhang, Zuolong [1 ]
He, Xin [1 ,4 ]
Long, Dazhi [5 ]
Luo, Gang [2 ]
Chen, Shengbo [3 ]
机构
[1] Henan Univ, Sch Software, Kaifeng 475000, Henan, Peoples R China
[2] Nanchang Univ, Sch Math & Comp Sci, Nanchang 330031, Jiangxi, Peoples R China
[3] Henan Univ, Henan Engn Res Ctr Intelligent Technol & Applicat, Kaifeng 475000, Henan, Peoples R China
[4] Henan Univ, Henan Int Joint Lab Intelligent Network Theory &, Kaifeng 475000, Henan, Peoples R China
[5] Jian Third Peoples Hosp, Dept Urol, Jian 343000, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
PREDICTION;
D O I
10.1093/bioinformatics/btae240
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation In drug discovery, it is crucial to assess the drug-target binding affinity (DTA). Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information.Results Therefore, we propose a novel DTA prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study.Availability and implementation The source code and data are available at https://github.com/daydayupzzl/HeteroDTA.
引用
收藏
页码:i539 / i547
页数:9
相关论文
共 22 条
  • [1] DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks
    Abbasi, Karim
    Razzaghi, Parvin
    Poso, Antti
    Amanlou, Massoud
    Ghasemi, Jahan B.
    Masoudi-Nejad, Ali
    [J]. BIOINFORMATICS, 2020, 36 (17) : 4633 - 4642
  • [2] Choudhury C, 2019, CHALL ADV COMPUT CHE, V27, P25, DOI 10.1007/978-3-030-05282-9_2
  • [3] Attentional Feature Fusion
    Dai, Yimian
    Gieseke, Fabian
    Oehmcke, Stefan
    Wu, Yiquan
    Barnard, Kobus
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3559 - 3568
  • [4] Comprehensive analysis of kinase inhibitor selectivity
    Davis, Mindy I.
    Hunt, Jeremy P.
    Herrgard, Sanna
    Ciceri, Pietro
    Wodicka, Lisa M.
    Pallares, Gabriel
    Hocker, Michael
    Treiber, Daniel K.
    Zarrinkar, Patrick P.
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (11) : 1046 - U124
  • [5] Geometry-enhanced molecular representation learning for property prediction
    Fang, Xiaomin
    Liu, Lihang
    Lei, Jiediong
    He, Donglong
    Zhang, Shanzhuo
    Zhou, Jingbo
    Wang, Fan
    Wu, Hua
    Wang, Haifeng
    [J]. NATURE MACHINE INTELLIGENCE, 2022, 4 (02) : 127 - 134
  • [6] SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines
    He, Tong
    Heidemeyer, Marten
    Ban, Fuqiang
    Cherkasov, Artem
    Ester, Martin
    [J]. JOURNAL OF CHEMINFORMATICS, 2017, 9
  • [7] FDA-approved heterocyclic molecules for cancer treatment: Synthesis, dosage, mechanism of action and their adverse effect
    Hossain, Mossaraf
    Habib, Imran
    Singha, Koustav
    Kumar, Anoop
    [J]. HELIYON, 2024, 10 (01)
  • [8] Computational molecular docking and virtual screening revealed promising SARS-CoV-2 drugs
    Hosseini, Maryam
    Chen, Wanqiu
    Xiao, Daliao
    Wang, Charles
    [J]. PRECISION CLINICAL MEDICINE, 2021, 4 (01) : 1 - 16
  • [9] Sequence-based drug-target affinity prediction using weighted graph neural networks
    Jiang, Mingjian
    Wang, Shuang
    Zhang, Shugang
    Zhou, Wei
    Zhang, Yuanyuan
    Li, Zhen
    [J]. BMC GENOMICS, 2022, 23 (01)
  • [10] Kipf T. N., 2017, P INT C LEARN REPR, P1