Molecular set representation learning

被引:7
作者
Boulougouri, Maria [1 ]
Vandergheynst, Pierre [1 ]
Probst, Daniel [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Inst Elect & Micro Engn, Sch Engn, Signal Proc Lab 2, Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
Machine learning;
D O I
10.1038/s42256-024-00856-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computational representation of molecules can take many forms, including graphs, string encodings of graphs, binary vectors or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classification and regression tasks using a wide range of machine learning models. However, existing models come with limitations, such as the requirement for clearly defined chemical bonds, which often do not represent the true underlying nature of a molecule. Here we propose a framework for molecular machine learning tasks based on set representation learning. We show that learning on sets of atom invariants alone reaches the performance of state-of-the-art graph-based models on the most-used chemical benchmark datasets and that introducing a set representation layer into graph neural networks can surpass the performance of established methods in the domains of chemistry, biology and material science. We introduce specialized set representation-based neural network architectures for reaction-yield and protein-ligand binding-affinity prediction. Overall, we show that the technique we denote molecular set representation learning is both an alternative and an extension to graph neural network architectures for machine learning tasks on molecules, molecule complexes and chemical reactions. Machine learning methods for molecule predictions use various representations of molecules such as in the form of strings or graphs. As an extension of graph representation learning, Probst and colleagues propose to represent a molecule as a set of atoms, to better capture the underlying chemical nature, and demonstrate improved performance in a range of machine learning tasks.
引用
收藏
页码:754 / 763
页数:14
相关论文
共 49 条
[11]   Quantum Chemistry in the Age of Machine Learning [J].
Dral, Pavlo O. .
JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2020, 11 (06) :2336-2347
[12]   Response to Comment on "Predicting reaction performance in C-N cross-coupling using machine learning" [J].
Estrada, Jesus G. ;
Ahneman, Derek T. ;
Sheridan, Robert P. ;
Dreher, Spencer D. ;
Doyle, Abigail G. .
SCIENCE, 2018, 362 (6416)
[13]   Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective [J].
Fang, Cheng ;
Wang, Ye ;
Grater, Richard ;
Kapadnis, Sudarshan ;
Black, Cheryl ;
Trapa, Patrick ;
Sciabola, Simone .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (11) :3263-3274
[14]   Geometry-enhanced molecular representation learning for property prediction [J].
Fang, Xiaomin ;
Liu, Lihang ;
Lei, Jiediong ;
He, Donglong ;
Zhang, Shanzhuo ;
Zhou, Jingbo ;
Wang, Fan ;
Wu, Hua ;
Wang, Haifeng .
NATURE MACHINE INTELLIGENCE, 2022, 4 (02) :127-134
[15]   Synthetic organic chemistry driven by artificial intelligence [J].
Filipa de Almeida, A. ;
Moreira, Rui ;
Rodrigues, Tiago .
NATURE REVIEWS CHEMISTRY, 2019, 3 (10) :589-604
[16]  
Gilmer J, 2017, PR MACH LEARN RES, V70
[17]   Artificial intelligence to deep learning: machine intelligence approach for drug discovery [J].
Gupta, Rohan ;
Srivastava, Devesh ;
Sahu, Mehar ;
Tiwari, Swati ;
Ambasta, Rashmi K. ;
Kumar, Pravir .
MOLECULAR DIVERSITY, 2021, 25 (03) :1315-1360
[18]   Computational evaluation of protein-small molecule binding [J].
Guvench, Olgun ;
MacKerell, Alexander D., Jr. .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2009, 19 (01) :56-61
[19]  
Hamilton W. L., 2018, PREPRINT
[20]   RosENet: Improving Binding Affinity Prediction by Leveraging Molecular Mechanics Energies with an Ensemble of 3D Convolutional Neural Networks [J].
Hassan-Harrirou, Hussein ;
Zhang, Ce ;
Lemmin, Thomas .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (06) :2791-2802