Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization

被引:0
作者
Garcia-Ortegon, Miguel [1 ,2 ,3 ]
Seal, Srijit [4 ]
Rasmussen, Carl [2 ]
Bender, Andreas [3 ]
Bacallado, Sergio [1 ]
机构
[1] Univ Cambridge, Stat Lab, Wilberforce Rd, Cambridge CB3 0WA, England
[2] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
[3] Univ Cambridge, Dept Chem, Lensfield Rd, Cambridge CB2 1EW, England
[4] Broad Inst & Harvard, Imaging Platform, 415 Main St, Cambridge, MA 02142 USA
来源
JOURNAL OF CHEMINFORMATICS | 2024年 / 16卷 / 01期
基金
英国惠康基金;
关键词
PROTEOCHEMOMETRICS; LIGANDS;
D O I
10.1186/s13321-024-00904-2
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to the potential novelty of meta-testing tasks. Molecular property prediction is one such research area that is characterized by sparse datasets of many functions on a shared molecular space. In this paper, we study the application of graph NPs to molecular property prediction with DOCKSTRING, a diverse dataset of docking scores. Graph NPs show competitive performance in few-shot learning tasks relative to supervised learning baselines common in chemoinformatics, as well as alternative techniques for transfer learning and meta-learning. In order to increase meta-generalization to divergent test functions, we propose fine-tuning strategies that adapt the parameters of NPs. We find that adaptation can substantially increase NPs' regression performance while maintaining good calibration of uncertainty estimates. Finally, we present a Bayesian optimization experiment which showcases the potential advantages of NPs over Gaussian processes in iterative screening. Overall, our results suggest that NPs on molecular graphs hold great potential for molecular property prediction in the low-data setting. Scientific contribution Neural processes are a family of meta-learning algorithms which deal with data scarcity by transferring information across tasks and making probabilistic predictions. We evaluate their performance on regression and optimization molecular tasks using docking scores, finding them to outperform classical single-task and transfer-learning models. We examine the issue of generalization to divergent test tasks, which is a general concern of meta-learning algorithms in science, and propose strategies to alleviate it.
引用
收藏
页数:18
相关论文
共 41 条
[1]  
Antoniou A., 2019, P 9 BALK C INF SOF B, P1, DOI DOI 10.1145/3351556.3351574
[2]   Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data [J].
Bender, Andreas ;
Cortes-Ciriano, Isidro .
DRUG DISCOVERY TODAY, 2021, 26 (04) :1040-1052
[3]  
Bongers Brandon J, 2019, Drug Discov Today Technol, V32-33, P89, DOI 10.1016/j.ddtec.2020.08.003
[4]  
Chan L, 2023, Embracing assay heterogeneity with neural processes for markedly improved bioactivity predictions
[5]  
ChEMBL, 2023, ChEMBL Web Services
[6]  
Chen W., 2023, INT C LEARN REPR
[7]  
Chung J., 2014, NIPS WORKSH DEEP LEA
[8]  
Finn C, 2017, PR MACH LEARN RES, V70
[9]  
Formont P, 2025, Arxiv, DOI arXiv:2404.02314
[10]  
Garcia-Ortegon M, 2022, NEURIPS 2022 WORKSH