Optimal selection of learning data for highly accurate QSAR prediction of chemical biodegradability: a machine learning-based approach

被引:1
作者
Takeda, K. [1 ]
Takeuchi, K. [2 ]
Sakuratani, Y. [2 ]
Kimbara, K. [1 ]
机构
[1] Shizuoka Univ, Grad Sch Integrated Sci & Technol, Hamamatsu, Japan
[2] Natl Inst Technol & Evaluat, Chem Management Ctr, Tokyo, Japan
关键词
Biodegradation; QSAR; BOD; OECD; machine learning; MODELS; READY;
D O I
10.1080/1062936X.2023.2251889
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Prior to the manufacture of new chemicals, regulations mandate a thorough review of the chemicals under risk management. This review involves evaluating their effects on the environment and human health. To assess these effects, a review report that conforms to the OECD Test Guidelines must be submitted to the regulatory body. One of the essential components of the report is an assessment of the biodegradability of chemicals in the environment. In addition to conventional methods, quantitative structure-activity relationship (QSAR) models have been developed to predict the properties of chemicals based on their structural features. Although a greater number of chemicals in the learning set may enhance the prediction accuracy, it may also lead to a decrease in accuracy due to the mixing of different structural features and properties of the chemicals. To improve the prediction performance, it is recommended to use only the appropriate data for biodegradability prediction as a training set. In this study, we propose a novel approach for the optimal selection of training set that enables a highly accurate prediction of the biodegradability of chemicals by QSAR. Our findings indicate that the proposed method effectively reduces the root mean squared error and improves the prediction accuracy.
引用
收藏
页码:729 / 743
页数:15
相关论文
共 21 条
  • [1] Alvascience, 2022, alvaDesc (software for molecular descriptors calculation) version, 2.0.12
  • [2] New Workflow for QSAR Model Development from Small Data Sets: Small Dataset Curator and Small Dataset Modeler. Integration of Data Curation, Exhaustive Double Cross-Validation, and a Set of Optimal Model Selection Techniques
    Ambure, Pravin
    Gajewicz-Skretna, Agnieszka
    Cordeiro, M. Natalia D. S.
    Roy, Kunal
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (10) : 4070 - 4076
  • [3] [Anonymous], 1992, OECD GUIDELINE TESTI
  • [4] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [5] Simulation of chemical metabolism for fate and hazard assessment. II CATALOGIC simulation of abiotic and microbial degradation
    Dimitrov, S.
    Pavlov, T.
    Dimitrova, N.
    Georgieva, D.
    Nedelcheva, D.
    Kesova, A.
    Vasilev, R.
    Mekenyan, O.
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2011, 22 (7-8) : 719 - 755
  • [6] CATALOGIC 301C model - validation and improvement
    Dimitrova, N. H.
    Dermen, I. A.
    Todorova, N. D.
    Vasilev, K. G.
    Dimitrov, S. D.
    Mekenyan, O. G.
    Ikenaga, Y.
    Aoyagi, T.
    Zaitsu, Y.
    Hamaguchi, C.
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2017, 28 (06) : 511 - 524
  • [7] Choosing Feature Selection and Learning Algorithms in QSAR
    Eklund, Martin
    Norinder, Ulf
    Boyer, Scott
    Carlsson, Lars
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (03) : 837 - 843
  • [8] CONNECTIONIST LEARNING PROCEDURES
    HINTON, GE
    [J]. ARTIFICIAL INTELLIGENCE, 1989, 40 (1-3) : 185 - 234
  • [9] Classification and Regression Machine Learning Models for Predicting Aerobic Ready and Inherent Biodegradation of Organic Chemicals in Water
    Huang, Kuan
    Zhang, Huichun
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2022, 56 (17) : 12755 - 12764
  • [10] Probabilistic assessment of biodegradability based on metabolic pathways: Catabol system
    Jaworska, J
    Dimitrov, S
    Nikolova, N
    Mekenyan, O
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2002, 13 (02) : 307 - 323