Computational representation of molecules can take many forms, including graphs, string encodings of graphs, binary vectors or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classification and regression tasks using a wide range of machine learning models. However, existing models come with limitations, such as the requirement for clearly defined chemical bonds, which often do not represent the true underlying nature of a molecule. Here we propose a framework for molecular machine learning tasks based on set representation learning. We show that learning on sets of atom invariants alone reaches the performance of state-of-the-art graph-based models on the most-used chemical benchmark datasets and that introducing a set representation layer into graph neural networks can surpass the performance of established methods in the domains of chemistry, biology and material science. We introduce specialized set representation-based neural network architectures for reaction-yield and protein-ligand binding-affinity prediction. Overall, we show that the technique we denote molecular set representation learning is both an alternative and an extension to graph neural network architectures for machine learning tasks on molecules, molecule complexes and chemical reactions. Machine learning methods for molecule predictions use various representations of molecules such as in the form of strings or graphs. As an extension of graph representation learning, Probst and colleagues propose to represent a molecule as a set of atoms, to better capture the underlying chemical nature, and demonstrate improved performance in a range of machine learning tasks.
机构:
Univ Kentucky, Dept Chem, Lexington, KY 40506 USA
Univ Kentucky, Ctr Appl Energy Res, Lexington, KY 40506 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Bhat, Vinayak
;
论文数: 引用数:
h-index:
机构:
Sornberger, Parker
;
Pokuri, Balaji Sesha Sarath
论文数: 0引用数: 0
h-index: 0
机构:
Iowa State Univ, Dept Mech Engn, Ames, IA 50010 USA
Iowa State Univ, Translat AI Ctr, Ames, IA 50010 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Pokuri, Balaji Sesha Sarath
;
Duke, Rebekah
论文数: 0引用数: 0
h-index: 0
机构:
Univ Kentucky, Dept Chem, Lexington, KY 40506 USA
Univ Kentucky, Ctr Appl Energy Res, Lexington, KY 40506 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Duke, Rebekah
;
Ganapathysubramanian, Baskar
论文数: 0引用数: 0
h-index: 0
机构:
Iowa State Univ, Dept Mech Engn, Ames, IA 50010 USA
Iowa State Univ, Translat AI Ctr, Ames, IA 50010 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Ganapathysubramanian, Baskar
;
Risko, Chad
论文数: 0引用数: 0
h-index: 0
机构:
Univ Kentucky, Dept Chem, Lexington, KY 40506 USA
Univ Kentucky, Ctr Appl Energy Res, Lexington, KY 40506 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
机构:
NIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Theiss Res, La Jolla, CA 92037 USA
DeepMaterials LLC, Silver Spring, MD 20906 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Choudhary, Kamal
;
DeCost, Brian
论文数: 0引用数: 0
h-index: 0
机构:
NIST, Mat Measurement Sci Div, Gaithersburg, MD 20899 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
DeCost, Brian
;
Chen, Chi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept NanoEngn, San Diego, CA 92093 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Chen, Chi
;
Jain, Anubhav
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, Energy Technol Area, Berkeley, CA USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Jain, Anubhav
;
Tavazza, Francesca
论文数: 0引用数: 0
h-index: 0
机构:
NIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Tavazza, Francesca
;
论文数: 引用数:
h-index:
机构:
Cohn, Ryan
;
论文数: 引用数:
h-index:
机构:
Park, Cheol Woo
;
Choudhary, Alok
论文数: 0引用数: 0
h-index: 0
机构:
Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Choudhary, Alok
;
Agrawal, Ankit
论文数: 0引用数: 0
h-index: 0
机构:
Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Agrawal, Ankit
;
Billinge, Simon J. L.
论文数: 0引用数: 0
h-index: 0
机构:
Columbia Univ, Sch Engn & Appl Sci, Fu Fdn, Dept Appl Phys, New York, NY 10027 USA
Columbia Univ, Sch Engn & Appl Sci, Fu Fdn, Appl Math & Data Sci Inst, New York, NY 10027 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Billinge, Simon J. L.
;
论文数: 引用数:
h-index:
机构:
Holm, Elizabeth
;
Ong, Shyue Ping
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept NanoEngn, San Diego, CA 92093 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Ong, Shyue Ping
;
Wolverton, Chris
论文数: 0引用数: 0
h-index: 0
机构:
Northwestern Univ, Dept Mat Sci & Engn, Evanston, IL 60208 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
机构:
Univ Kentucky, Dept Chem, Lexington, KY 40506 USA
Univ Kentucky, Ctr Appl Energy Res, Lexington, KY 40506 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Bhat, Vinayak
;
论文数: 引用数:
h-index:
机构:
Sornberger, Parker
;
Pokuri, Balaji Sesha Sarath
论文数: 0引用数: 0
h-index: 0
机构:
Iowa State Univ, Dept Mech Engn, Ames, IA 50010 USA
Iowa State Univ, Translat AI Ctr, Ames, IA 50010 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Pokuri, Balaji Sesha Sarath
;
Duke, Rebekah
论文数: 0引用数: 0
h-index: 0
机构:
Univ Kentucky, Dept Chem, Lexington, KY 40506 USA
Univ Kentucky, Ctr Appl Energy Res, Lexington, KY 40506 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Duke, Rebekah
;
Ganapathysubramanian, Baskar
论文数: 0引用数: 0
h-index: 0
机构:
Iowa State Univ, Dept Mech Engn, Ames, IA 50010 USA
Iowa State Univ, Translat AI Ctr, Ames, IA 50010 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
Ganapathysubramanian, Baskar
;
Risko, Chad
论文数: 0引用数: 0
h-index: 0
机构:
Univ Kentucky, Dept Chem, Lexington, KY 40506 USA
Univ Kentucky, Ctr Appl Energy Res, Lexington, KY 40506 USAUniv Kentucky, Dept Chem, Lexington, KY 40506 USA
机构:
NIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Theiss Res, La Jolla, CA 92037 USA
DeepMaterials LLC, Silver Spring, MD 20906 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Choudhary, Kamal
;
DeCost, Brian
论文数: 0引用数: 0
h-index: 0
机构:
NIST, Mat Measurement Sci Div, Gaithersburg, MD 20899 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
DeCost, Brian
;
Chen, Chi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept NanoEngn, San Diego, CA 92093 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Chen, Chi
;
Jain, Anubhav
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, Energy Technol Area, Berkeley, CA USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Jain, Anubhav
;
Tavazza, Francesca
论文数: 0引用数: 0
h-index: 0
机构:
NIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Tavazza, Francesca
;
论文数: 引用数:
h-index:
机构:
Cohn, Ryan
;
论文数: 引用数:
h-index:
机构:
Park, Cheol Woo
;
Choudhary, Alok
论文数: 0引用数: 0
h-index: 0
机构:
Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Choudhary, Alok
;
Agrawal, Ankit
论文数: 0引用数: 0
h-index: 0
机构:
Northwestern Univ, Dept Elect & Comp Engn, Evanston, IL 60208 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Agrawal, Ankit
;
Billinge, Simon J. L.
论文数: 0引用数: 0
h-index: 0
机构:
Columbia Univ, Sch Engn & Appl Sci, Fu Fdn, Dept Appl Phys, New York, NY 10027 USA
Columbia Univ, Sch Engn & Appl Sci, Fu Fdn, Appl Math & Data Sci Inst, New York, NY 10027 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Billinge, Simon J. L.
;
论文数: 引用数:
h-index:
机构:
Holm, Elizabeth
;
Ong, Shyue Ping
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept NanoEngn, San Diego, CA 92093 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA
Ong, Shyue Ping
;
Wolverton, Chris
论文数: 0引用数: 0
h-index: 0
机构:
Northwestern Univ, Dept Mat Sci & Engn, Evanston, IL 60208 USANIST, Mat Sci & Engn Div, Gaithersburg, MD 20899 USA