Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis

被引:27
作者
Gimadiev, Timur [1 ,2 ]
Madzhidov, Timur [1 ]
Tetko, Igor [3 ]
Nugmanov, Ramil [1 ]
Casciuc, Iury [2 ]
Klimchuk, Olga [2 ]
Bodrov, Andrey [1 ,5 ]
Polishchuk, Pavel [4 ]
Antipin, Igor [1 ]
Varnek, Alexandre [2 ]
机构
[1] Kazan Fed Univ, Butlerov Inst Chem, Lab Chemoinformat & Mol Modeling, Kremlyovskaya Str 18, Kazan, Russia
[2] Univ Strasbourg, CNRS, UMR 7140, Lab Chemoinformat, 1 Rue Blaise Pascal, F-67000 Strasbourg, France
[3] Helmholtz Zentrum Munchen German Res Ctr Environm, Inst Struct Biol, Ingolstadter Landstr 1, D-85764 Neuherberg, Germany
[4] Palacky Univ, Fac Med & Dent, Inst Mol & Translat Med, Hnevotinska 1333-5, Olomouc 77900, Czech Republic
[5] Kazan State Med Univ, Dept Gen & Organ Chem, Kazan, Russia
基金
俄罗斯科学基金会;
关键词
bimolecular nucleophilic substitution reactions; Condensed Graph of Reaction; Matched Reaction Pairs; Support Vector Regression; Generative Topographic Mapping; models applicability domain; SOLVATOCHROMIC COMPARISON METHOD; APPLICABILITY DOMAIN; S(N)2 REACTIONS; SCALE; GRAPH; REPRESENTATION;
D O I
10.1002/minf.201800104
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Here, we report the data visualization, analysis and modeling for a large set of 4830 S(N)2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph - Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics techniques developed for individual molecules. Thus, Matched Reaction Pairs approach was suggested and used for the analyses of substituents effects on the substrates and nucleophiles reactivity. The data were visualized with the help of the Generative Topographic Mapping approach. Consensus Support Vector Regression (SVR) model for the rate constant was prepared. Unbiased estimation of the model's performance was made in cross-validation on reactions measured on unique structural transformations. The model's performance in cross-validation (RMSE=0.61 logk units) and on the external test set (RMSE=0.80) is close to the noise in data. Performances of the local models obtained for selected subsets of reactions proceeding in particular solvents or with particular type of nucleophiles were similar to that of the model built on the entire set. Finally, four different definitions of model's applicability domains for reactions were examined.
引用
收藏
页数:14
相关论文
共 34 条
[1]  
[Anonymous], 2015, EPAM SYSTEMS
[2]  
[Anonymous], 2015, BUTLEROV COMMUN
[3]   GTM: The generative topographic mapping [J].
Bishop, CM ;
Svensen, M ;
Williams, CKI .
NEURAL COMPUTATION, 1998, 10 (01) :215-234
[4]  
Catalan J, 1996, LIEBIGS ANN, P1785
[5]  
CATALAN J, 1995, LIEBIGS ANN, P241
[6]  
Catalan J, 1997, LIEBIGS ANN-RECL, P1941
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   DESCRIPTION OF SEVERAL CHEMICAL-STRUCTURE FILE FORMATS USED BY COMPUTER-PROGRAMS DEVELOPED AT MOLECULAR DESIGN LIMITED [J].
DALBY, A ;
NOURSE, JG ;
HOUNSHELL, WD ;
GUSHURST, AKI ;
GRIER, DL ;
LELAND, BA ;
LAUFER, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (03) :244-255
[9]   Generative Topographic Mapping-Based Classification Models and Their Applicability Domain: Application to the Biopharmaceutics Drug Disposition Classification System (BDDCS) [J].
Gaspar, Helena A. ;
Marcou, Gilles ;
Horvath, Dragos ;
Arault, Alban ;
Lozano, Sylvain ;
Vayer, Philippe ;
Varnek, Alexandre .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (12) :3318-3325
[10]   Assessment of tautomer distribution using the condensed reaction graph approach [J].
Gimadiev, T. R. ;
Madzhidov, T. I. ;
Nugmanov, R. I. ;
Baskin, I. I. ;
Antipin, I. S. ;
Varnek, A. .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2018, 32 (03) :401-414