Toward generalizable structure-based deep learning models for protein-ligand interaction prediction: Challenges and strategies

被引:4
作者
Moon, Seokhyun [1 ]
Zhung, Wonho [1 ]
Kim, Woo Youn [1 ,2 ,3 ,4 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Chem, Daejeon, South Korea
[2] Korea Adv Inst Sci & Technol, AI Inst, Daejeon, South Korea
[3] HITS Inc, Seoul, South Korea
[4] Korea Adv Inst Sci & Technol, Dept Chem, 291 Daehak Ro, Daejeon 34141, South Korea
关键词
drug discovery; generalizability; protein-ligand interaction; structure-based deep learning; virtual screening; CSAR BENCHMARK EXERCISE; OUT CROSS-VALIDATION; SCORING FUNCTIONS; BINDING-AFFINITY; BLIND PREDICTION; ACCURATE DOCKING; NEURAL-NETWORK; DATA SETS; DATABASE; DIVERSE;
D O I
10.1002/wcms.1705
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Accurate and rapid prediction of protein-ligand interactions (PLIs) is the fundamental challenge of drug discovery. Deep learning methods have been harnessed for this purpose, yet the insufficient generalizability of PLI prediction prevents their broader impact on practical applications. Here, we highlight the significance of PLI model generalizability by conceiving PLIs as a function defined on infinitely diverse protein-ligand pairs and binding poses. To delve into the generalization challenges within PLI predictions, we comprehensively explore the evaluation strategies to assess the generalizability fairly. Moreover, we categorize structure-based PLI models with leveraged strategies for learning generalizable features from structure-based PLI data. Finally, we conclude the review by emphasizing the need for accurate pose-predicting methods, which is a prerequisite for more accurate PLI predictions. This article is categorized under: Data Science > Artificial Intelligence/Machine Learning Data Science > Chemoinformatics Structure and Mechanism > Computational Biochemistry and Biophysics
引用
收藏
页数:26
相关论文
共 174 条
[1]   Geometric deep learning on molecular representations [J].
Atz, Kenneth ;
Grisoni, Francesca ;
Schneider, Gisbert .
NATURE MACHINE INTELLIGENCE, 2021, 3 (12) :1023-1032
[2]  
Bagherian M, 2021, BRIEF BIOINFORM, V22, P247, DOI 10.1093/bib/bbz157
[3]   Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (08) :1739-1741
[4]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[5]  
Ban T, 2017, INT CONF COMPUT ADV
[6]  
Bao H, 2022, PR MACH LEARN RES
[7]   E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials [J].
Batzner, Simon ;
Musaelian, Albert ;
Sun, Lixin ;
Geiger, Mario ;
Mailoa, Jonathan P. ;
Kornbluth, Mordechai ;
Molinari, Nicola ;
Smidt, Tess E. ;
Kozinsky, Boris .
NATURE COMMUNICATIONS, 2022, 13 (01)
[8]  
Bauer MR, 2013, J CHEM INF MODEL, V53, P1447, DOI [10.1021/ci400115b, 10.1021/ci400115bl]
[9]   Atom-centered symmetry functions for constructing high-dimensional neural network potentials [J].
Behler, Joerg .
JOURNAL OF CHEMICAL PHYSICS, 2011, 134 (07)
[10]  
Bergstra James, 2011, Adv Neural Inf Process Syst, V24, DOI DOI 10.5555/2986459.2986743