Molecular set representation learning

被引:7
作者
Boulougouri, Maria [1 ]
Vandergheynst, Pierre [1 ]
Probst, Daniel [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Inst Elect & Micro Engn, Sch Engn, Signal Proc Lab 2, Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
Machine learning;
D O I
10.1038/s42256-024-00856-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computational representation of molecules can take many forms, including graphs, string encodings of graphs, binary vectors or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classification and regression tasks using a wide range of machine learning models. However, existing models come with limitations, such as the requirement for clearly defined chemical bonds, which often do not represent the true underlying nature of a molecule. Here we propose a framework for molecular machine learning tasks based on set representation learning. We show that learning on sets of atom invariants alone reaches the performance of state-of-the-art graph-based models on the most-used chemical benchmark datasets and that introducing a set representation layer into graph neural networks can surpass the performance of established methods in the domains of chemistry, biology and material science. We introduce specialized set representation-based neural network architectures for reaction-yield and protein-ligand binding-affinity prediction. Overall, we show that the technique we denote molecular set representation learning is both an alternative and an extension to graph neural network architectures for machine learning tasks on molecules, molecule complexes and chemical reactions. Machine learning methods for molecule predictions use various representations of molecules such as in the form of strings or graphs. As an extension of graph representation learning, Probst and colleagues propose to represent a molecule as a set of atoms, to better capture the underlying chemical nature, and demonstrate improved performance in a range of machine learning tasks.
引用
收藏
页码:754 / 763
页数:14
相关论文
共 49 条
[1]   GEOM, energy-annotated molecular conformations for property prediction and molecular generation [J].
Axelrod, Simon ;
Gomez-Bombarelli, Rafael .
SCIENTIFIC DATA, 2022, 9 (01)
[2]   Strategy of utilizing in vitro and in vivo ADME tools for lead optimization and drug candidate selection [J].
Balani, SK ;
Miwa, GT ;
Gan, LS ;
Wu, JT ;
Lee, FW .
CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2005, 5 (11) :1033-1038
[3]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[4]  
Bao Q., 2023, PREPRINT
[5]   Electronic, redox, and optical property prediction of organic π-conjugated molecules through a hierarchy of machine learning approaches [J].
Bhat, Vinayak ;
Sornberger, Parker ;
Pokuri, Balaji Sesha Sarath ;
Duke, Rebekah ;
Ganapathysubramanian, Baskar ;
Risko, Chad .
CHEMICAL SCIENCE, 2022, 14 (01) :203-213
[6]   Opportunities and challenges using artificial intelligence in ADME/Tox [J].
Bhhatarai, Barun ;
Walters, W. Patrick ;
Hop, Cornelis E. C. A. ;
Lanza, Guido ;
Ekins, Sean .
NATURE MATERIALS, 2019, 18 (05) :418-422
[7]   One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome [J].
Capecchi, Alice ;
Probst, Daniel ;
Reymond, Jean-Louis .
JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
[8]   Recent advances and applications of deep learning methods in materials science [J].
Choudhary, Kamal ;
DeCost, Brian ;
Chen, Chi ;
Jain, Anubhav ;
Tavazza, Francesca ;
Cohn, Ryan ;
Park, Cheol Woo ;
Choudhary, Alok ;
Agrawal, Ankit ;
Billinge, Simon J. L. ;
Holm, Elizabeth ;
Ong, Shyue Ping ;
Wolverton, Chris .
NPJ COMPUTATIONAL MATERIALS, 2022, 8 (01)
[9]   Machine-learning methods for ligand-protein molecular docking [J].
Crampon, Kevin ;
Giorkallos, Alexis ;
Deldossi, Myrtille ;
Baud, Stephanie ;
Steffenel, Luiz Angelo .
DRUG DISCOVERY TODAY, 2022, 27 (01) :151-164
[10]  
Dittadi Andrea, 2022, ADV NEURAL INFORM PR