SAM-DTA: a sequence -agnostic model for drug-target binding affinity prediction

被引：8

作者：

Hu, Zhiqiang ^{[1
]}

Liu, Wenfeng ^{[2
]}

Zhang, Chenbin ^{[3
]}

Huang, Jiawen ^{[2
]}

Zhang, Shaoting ^{[4
,5
]}

Yu, Huiqun ^{[6
,7
]}

Xiong, Yi ^{[8
]}

Liu, Hao ^{[9
]}

Ke, Song ^{[10
]}

Hong, Liang ^{[9
]}

机构：

[1] Sense Time Res China, Precis Med Div, Hong Kong, Peoples R China

[2] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai, Peoples R China

[3] Peking Univ, Beijing, Peoples R China

[4] Sensetime, Hong Kong, Peoples R China

[5] Shanghai Artificial Intelligence Lab, Smart Hlth, Shanghai, Peoples R China

[6] ECUST, Dept Comp Sci & Engn, Comp Sci, Shanghai, Peoples R China

[7] Florida Int Univ, Sch Comp Sci, Miami, FL USA

[8] Shanghai Jiao Tong Univ, Sch Life Sci & Biotechnol, Shanghai, Peoples R China

[9] SJTU, Inst Nat Sci, Shanghai, Peoples R China

[10] Matwings Technol, Shanghai, Peoples R China

来源：

BRIEFINGS IN BIOINFORMATICS | 2023年 / 24卷 / 01期

基金：

美国国家科学基金会;

关键词：

drug-target binding affinity; deep learning; sequence -agnostic model; PROTEIN; DOCKING;

D O I：

10.1093/bib/bbac533

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Drug-target binding affinity prediction is a fundamental task for drug discovery and has been studied for decades. Most methods follow the canonical paradigm that processes the inputs of the protein (target) and the ligand (drug) separately and then combines them together. In this study we demonstrate, surprisingly, that a model is able to achieve even superior performance without access to any protein-sequence -related information. Instead, a protein is characterized completely by the ligands that it interacts. Specifically, we treat different proteins separately, which are jointly trained in a multi -head manner, so as to learn a robust and universal representation of ligands that is generalizable across proteins. Empirical evidences show that the novel paradigm outperforms its competitive sequence -based counterpart, with the Mean Squared Error (MSE) of 0.4261 versus 0.7612 and the R -Square of 0.7984 versus 0.6570 compared with DeepAffinity. We also investigate the transfer learning scenario where unseen proteins are encountered after the initial training, and the cross-dataset evaluation for prospective studies. The results reveals the robustness of the proposed model in generalizing to unseen proteins as well as in predicting future data. Source codes and data are available at https://github.com/huzqatpku/SAM-DTA.

引用

页数：15

共 63 条

[1] Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening [J].

Ain, Qurrat Ul ;

Aleksandrova, Antoniya ;

Roessler, Florian D. ;

Ballester, Pedro J. .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2015, 5 (06) :405-424

[2]

[Anonymous], 2010, RDKIT OPEN SOURCE CH

[3] RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy [J].

Burley, Stephen K. ;

Berman, Helen M. ;

Bhikadiya, Charmi ;

Bi, Chunxiao ;

Chen, Li ;

Di Costanzo, Luigi ;

Christie, Cole ;

Dalenberg, Ken ;

Duarte, Jose M. ;

Dutta, Shuchismita ;

Feng, Zukang ;

Ghosh, Sutapa ;

Goodsell, David S. ;

Green, Rachel K. ;

Guranovic, Vladimir ;

Guzenko, Dmytro ;

Hudson, Brian P. ;

Kalro, Tara ;

Liang, Yuhe ;

Lowe, Robert ;

Namkoong, Harry ;

Peisach, Ezra ;

Periskova, Irina ;

Prlic, Andreas ;

Randle, Chris ;

Rose, Alexander ;

Rose, Peter ;

Sala, Raul ;

Sekharan, Monica ;

Shao, Chenghua ;

Tan, Lihua ;

Tao, Yi-Ping ;

Valasatava, Yana ;

Voigt, Maria ;

Westbrook, John ;

Woo, Jesse ;

Yang, Huanwang ;

Young, Jasmine ;

Zhuravleva, Marina ;

Zardecki, Christine .

NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D464-D474

[4] TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions [J].

Cang, Zixuan ;

Wei, Guowei .

PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (07)

[5] Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model [J].

Chang, Roger L. ;

Xie, Li ;

Xie, Lei ;

Bourne, Philip E. ;

Palsson, Bernhard O. .

PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (09)

[6] Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods [J].

Cheng, Feixiong ;

Zhou, Yadi ;

Li, Jie ;

Li, Weihua ;

Liu, Guixia ;

Tang, Yun .

MOLECULAR BIOSYSTEMS, 2012, 8 (09) :2373-2384

[7] Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples [J].

Cheng, Zhanzhan ;

Zhou, Shuigeng ;

Wang, Yang ;

Liu, Hui ;

Guan, Jihong ;

Chen, Yi-Ping Phoebe .

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (06) :1832-1843

[8]

Cho E., 2021, J CHEM INF MODEL

[9]

Cho K., 2014, P SSST 8 8 WORKSHOP, DOI 10.3115/v1/w14-4012

[10] Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors [J].

Cichonska, Anna ;

Ravikumar, Balaguru ;

Parri, Elina ;

Timonen, Sanna ;

Pahikkala, Tapio ;

Airola, Antti ;

Wennerberg, Krister ;

Rousu, Juho ;

Aittokallio, Tero .

PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (08)

← 1 2 3 4 5 6 7 →