Cross-validation strategies in QSPR modelling of chemical reactions

被引:14
|
作者
Rakhimbekova, A. [1 ]
Akhmetshin, T. N. [1 ,2 ]
Minibaeva, G. I. [1 ]
Nugmanov, R. I. [1 ]
Gimadiev, T. R. [3 ]
Madzhidov, T. I. [1 ]
Baskin, I. I. [4 ]
Varnek, A. [2 ]
机构
[1] Kazan Fed Univ, AM Butlerov Inst Chem, Kazan, Russia
[2] Univ Strasbourg, UMR 7140 CNRS, Lab Chemoinformat, Strasbourg, France
[3] Hokkaido Univ, Inst Chem React Design & Discovery, Sapporo, Hokkaido, Japan
[4] Technion Israel Inst Technol, Dept Mat Sci & Engn, Haifa, Israel
基金
俄罗斯科学基金会;
关键词
Validation; QSPR; chemical reactions; rate constant prediction; reaction rate; structure-reactivity modelling;
D O I
10.1080/1062936X.2021.1883107
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an `optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, `transformation-out' CV, and `solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
引用
收藏
页码:207 / 219
页数:13
相关论文
共 50 条
  • [41] Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation
    Thomas-Martin Dutschmann
    Lennart Kinzel
    Antonius ter Laak
    Knut Baumann
    Journal of Cheminformatics, 15
  • [42] Translation, cultural adaptation, cross-validation of the Turkish diabetes quality-of-life (DQOL) measure
    Aysegul Yildirim
    Fevzi Akinci
    Hulya Gozu
    Haluk Sargin
    Ekrem Orbay
    Mehmet Sargin
    Quality of Life Research, 2007, 16 : 873 - 879
  • [43] ASSESSING ADJUSTMENT TO AGING: A CROSS-VALIDATION STUDY FOR ADJUSTMENT TO AGING SCALE (ATAS-33)
    von Humboldt, S.
    Leal, I.
    GERONTOLOGIST, 2012, 52 : 255 - 255
  • [44] Disentangling data dependency using cross-validation strategies to evaluate prediction quality of cattle grazing activities using machine learning algorithms and wearable sensor data
    Ribeiro, Leonardo Augusto Coelho
    Bresolin, Tiago
    Rosa, Guilherme Jordao de Magalhaes
    Casagrande, Daniel Rume
    Danes, Marina de Arruda Camargo
    Dorea, Joao Ricardo Reboucas
    JOURNAL OF ANIMAL SCIENCE, 2021, 99 (09)
  • [45] Cross-validation of pharmacokinetic assays post-ICH M10 is not a pass/fail criterion
    Fjording, Marianne Scheel
    Goodman, Joanne
    Briscoe, Chad
    BIOANALYSIS, 2025, 17 (01) : 1 - 5
  • [46] Investigating the microscopic mechanisms of deep eutectic solvents formed with natural compounds: Multiscale simulation and cross-validation
    Yao, Congfei
    Li, Xiaoyu
    Chen, Qiuyu
    Liu, Zheng
    Wu, Haisong
    Zhang, Wanxiang
    Miao, Yuqing
    Huang, Weijia
    JOURNAL OF CLEANER PRODUCTION, 2024, 441
  • [47] Internal-external cross-validation helped to evaluate the generalizability of prediction models in large clustered datasets
    Takada, Toshihiko
    Nijman, Steven
    Denaxas, Spiros
    Snell, Kym I. E.
    Uijl, Alicia
    Nguyen, Tri-Long
    Asselbergs, Folkert W.
    Debray, Thomas P. A.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2021, 137 : 83 - 91
  • [48] Cross-validation of breeding values and future phenotypes for heifer pregnancy in Red Angus cattle using the LR method
    Giess, Lane K.
    Speidel, Scott E.
    Upperman, Lindsay R.
    Boldt, Ryan J.
    Shafer, Wade R.
    Enns, R. Mark
    JOURNAL OF ANIMAL SCIENCE, 2024, 102 : 27 - 27
  • [49] Cross-validation of breeding values and future phenotypes for heifer pregnancy in Red Angus cattle using the LR method
    Giess, Lane K.
    Speidel, Scott E.
    Upperman, Lindsay R.
    Boldt, Ryan J.
    Shafer, Wade R.
    Enns, R. Mark
    JOURNAL OF ANIMAL SCIENCE, 2024, 102 : 27 - 28
  • [50] Cross-Validation of the Taiwan Version of the Moorehead–Ardelt Quality of Life Questionnaire II with WHOQOL and SF-36
    Chi-Yang Chang
    Chih-Kun Huang
    Yu-Yin Chang
    Chi-Ming Tai
    Jaw-Town Lin
    Jung-Der Wang
    Obesity Surgery, 2010, 20 : 1568 - 1574