Analyzing Learned Molecular Representations for Property Prediction

被引:1095
作者
Yang, Kevin [1 ]
Swanson, Kyle [1 ]
Jin, Wengong [1 ]
Coley, Connor [2 ]
Eiden, Philipp [3 ]
Gao, Hua [4 ]
Guzman-Perez, Angel [4 ]
Hopper, Timothy [4 ]
Kelley, Brian [5 ]
Mathea, Miriam [3 ]
Palmer, Andrew [3 ]
Settels, Volker [3 ]
Jaakkola, Tommi [1 ]
Jensen, Klavs [2 ]
Barzilay, Regina [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[3] BASF SE, D-67063 Ludwigshafen, Germany
[4] Amgen Inc, Cambridge, MA 02141 USA
[5] Novartis Inst BioMed Res, Cambridge, MA 02139 USA
关键词
Algorithmic solutions - Convolutional model - Molecular descriptors - Molecular fingerprint - Molecular properties - Molecular representations - Neural architectures - Property predictions;
D O I
10.1021/acs.jcim.9b00237
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
引用
收藏
页码:3370 / 3388
页数:19
相关论文
共 59 条
[51]  
Schutt KT, 2017, 31 ANN C NEURAL INFO, V30
[52]   Taking the Human Out of the Loop: A Review of Bayesian Optimization [J].
Shahriari, Bobak ;
Swersky, Kevin ;
Wang, Ziyu ;
Adams, Ryan P. ;
de Freitas, Nando .
PROCEEDINGS OF THE IEEE, 2016, 104 (01) :148-175
[53]   Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction [J].
Sheridan, Robert P. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (04) :783-790
[54]   ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost [J].
Smith, J. S. ;
Isayev, O. ;
Roitberg, A. E. .
CHEMICAL SCIENCE, 2017, 8 (04) :3192-3203
[55]   Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method [J].
Swamidass, S. Joshua ;
Azencott, Chloe-Agathe ;
Lin, Ting-Wan ;
Gramajo, Hugo ;
Tsai, Shiou-Chuan ;
Baldi, Pierre .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (04) :756-766
[56]   Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity [J].
Swamidass, SJ ;
Chen, J ;
Phung, P ;
Ralaivola, L ;
Baldi, P .
BIOINFORMATICS, 2005, 21 :I359-I368
[57]   SMILES, A CHEMICAL LANGUAGE AND INFORMATION-SYSTEM .1. INTRODUCTION TO METHODOLOGY AND ENCODING RULES [J].
WEININGER, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1988, 28 (01) :31-36
[58]  
Welling M., 2016, SEMISUPERVISED CLASS
[59]   MoleculeNet: a benchmark for molecular machine learning [J].
Wu, Zhenqin ;
Ramsundar, Bharath ;
Feinberg, Evan N. ;
Gomes, Joseph ;
Geniesse, Caleb ;
Pappu, Aneesh S. ;
Leswing, Karl ;
Pande, Vijay .
CHEMICAL SCIENCE, 2018, 9 (02) :513-530