Enhancing generalizability and performance in drug-target interaction identification by integrating pharmacophore and pre-trained models

被引：1

作者：

Zhang, Zuolong ^{[1
]}

He, Xin ^{[1
,4
]}

Long, Dazhi ^{[5
]}

Luo, Gang ^{[2
]}

Chen, Shengbo ^{[3
]}

机构：

[1] Henan Univ, Sch Software, Kaifeng 475000, Henan, Peoples R China

[2] Nanchang Univ, Sch Math & Comp Sci, Nanchang 330031, Jiangxi, Peoples R China

[3] Henan Univ, Henan Engn Res Ctr Intelligent Technol & Applicat, Kaifeng 475000, Henan, Peoples R China

[4] Henan Univ, Henan Int Joint Lab Intelligent Network Theory &, Kaifeng 475000, Henan, Peoples R China

[5] Jian Third Peoples Hosp, Dept Urol, Jian 343000, Jiangxi, Peoples R China

来源：

BIOINFORMATICS | 2024年 / 40卷

基金：

中国国家自然科学基金;

关键词：

PREDICTION;

D O I：

10.1093/bioinformatics/btae240

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation In drug discovery, it is crucial to assess the drug-target binding affinity (DTA). Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information.Results Therefore, we propose a novel DTA prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study.Availability and implementation The source code and data are available at https://github.com/daydayupzzl/HeteroDTA.

引用

页码：i539 / i547

页数：9

共 22 条

[1] DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks
Abbasi, Karim
Razzaghi, Parvin
Poso, Antti
Amanlou, Massoud
Ghasemi, Jahan B.
Masoudi-Nejad, Ali
[J]. BIOINFORMATICS, 2020, 36 (17) : 4633 - 4642
[2] Choudhury C, 2019, CHALL ADV COMPUT CHE, V27, P25, DOI 10.1007/978-3-030-05282-9_2
[3] Attentional Feature Fusion
Dai, Yimian
Gieseke, Fabian
Oehmcke, Stefan
Wu, Yiquan
Barnard, Kobus
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3559 - 3568
[4] Comprehensive analysis of kinase inhibitor selectivity
Davis, Mindy I.
Hunt, Jeremy P.
Herrgard, Sanna
Ciceri, Pietro
Wodicka, Lisa M.
Pallares, Gabriel
Hocker, Michael
Treiber, Daniel K.
Zarrinkar, Patrick P.
[J]. NATURE BIOTECHNOLOGY, 2011, 29 (11) : 1046 - U124
[5] Geometry-enhanced molecular representation learning for property prediction
Fang, Xiaomin
Liu, Lihang
Lei, Jiediong
He, Donglong
Zhang, Shanzhuo
Zhou, Jingbo
Wang, Fan
Wu, Hua
Wang, Haifeng
[J]. NATURE MACHINE INTELLIGENCE, 2022, 4 (02) : 127 - 134
[6] SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines
He, Tong
Heidemeyer, Marten
Ban, Fuqiang
Cherkasov, Artem
Ester, Martin
[J]. JOURNAL OF CHEMINFORMATICS, 2017, 9
[7] FDA-approved heterocyclic molecules for cancer treatment: Synthesis, dosage, mechanism of action and their adverse effect
Hossain, Mossaraf
Habib, Imran
Singha, Koustav
Kumar, Anoop
[J]. HELIYON, 2024, 10 (01)
[8] Computational molecular docking and virtual screening revealed promising SARS-CoV-2 drugs
Hosseini, Maryam
Chen, Wanqiu
Xiao, Daliao
Wang, Charles
[J]. PRECISION CLINICAL MEDICINE, 2021, 4 (01) : 1 - 16
[9] Sequence-based drug-target affinity prediction using weighted graph neural networks
Jiang, Mingjian
Wang, Shuang
Zhang, Shugang
Zhou, Wei
Zhang, Yuanyuan
Li, Zhen
[J]. BMC GENOMICS, 2022, 23 (01)
[10] Kipf T. N., 2017, P INT C LEARN REPR, P1

← 1 2 3 →