Predictive Modeling of Pesticides Reproductive Toxicity in Earthworms Using Interpretable Machine-Learning Techniques on Imbalanced Data

被引:0
|
作者
Kotli, Mihkel [1 ]
Piir, Geven [1 ]
Maran, Uko [1 ]
机构
[1] Univ Tartu, Inst Chem, EE-50411 Tartu, Estonia
来源
ACS OMEGA | 2025年 / 10卷 / 05期
基金
欧盟地平线“2020”;
关键词
QSAR MODELS; CHEMICALS; SORPTION;
D O I
10.1021/acsomega.4c09719
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The earthworm is a key indicator species in soil ecosystems. This makes the reproductive toxicity of chemical compounds to earthworms a desired property of determination and makes computational models necessary for descriptive and predictive purposes. Thus, the aim was to develop an advanced Quantitative Structure-Activity Relationship modeling approach for this complex property with imbalanced data. The approach integrated gradient-boosted decision trees as classifiers with a genetic algorithm for feature selection and Bayesian optimization for hyperparameter tuning. An additional goal was to analyze and interpret, using SHAP values, the structural features encoded by the molecular descriptors that contribute to pesticide toxicity and nontoxicity, the most notable of which are solvation entropy and a number of hydrolyzable bonds. The final model was constructed as a stacked ensemble of models and combined the strengths of the individual models. Evaluation of this model with an external test set of 147 compounds demonstrated a well-defined applicability domain and sufficient predictive capabilities with a Balanced Accuracy of 77%. The model representation follows FAIR principles and is available on QsarDB.org.
引用
收藏
页码:4732 / 4744
页数:13
相关论文
共 50 条
  • [31] Stiffness Data of High-Modulus Asphalt Concretes for Road Pavements: Predictive Modeling by Machine-Learning
    Baldo, Nicola
    Miani, Matteo
    Rondinella, Fabio
    Valentin, Jan
    Vackcova, Pavla
    Manthos, Evangelos
    COATINGS, 2022, 12 (01)
  • [32] Imbalanced data preprocessing techniques for machine learning: a systematic mapping study
    Vitor Werner de Vargas
    Jorge Arthur Schneider Aranda
    Ricardo dos Santos Costa
    Paulo Ricardo da Silva Pereira
    Jorge Luis Victória Barbosa
    Knowledge and Information Systems, 2023, 65 : 31 - 57
  • [33] DATA-DRIVEN PREDICTION OF CELLULAR NETWORKS COVERAGE: AN INTERPRETABLE MACHINE-LEARNING MODEL
    Ghasemi, Amir
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 604 - 608
  • [34] Breast cancer prediction based on gene expression data using interpretable machine learning techniques
    Kallah-Dagadu, Gabriel
    Mohammed, Mohanad
    Nasejje, Justine B.
    Mchunu, Nobuhle Nokubonga
    Twabi, Halima S.
    Batidzirai, Jesca Mercy
    Singini, Geoffrey Chiyuzga
    Nevhungoni, Portia
    Maposa, Innocent
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [35] Using machine-learning to create predictive material property models
    Wolverton, Chris
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [36] Using machine-learning to create predictive material property models
    Wolverton, Chris
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [37] Caller-Agent Pairing in Call Centers Using Machine Learning Techniques with Imbalanced Data
    Mehrbod, Negin
    Grilo, Antonio
    Zutshi, Aneesh
    2018 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND INNOVATION (ICE/ITMC), 2018,
  • [38] Handling highly imbalanced data for classifying fatality of auto collisions using machine learning techniques
    Xie, Shengkun
    Zhang, Jin
    JOURNAL OF MANAGEMENT ANALYTICS, 2024, 11 (03) : 317 - 357
  • [39] Using machine-learning methods for musical style modeling
    Dubnov, S
    Assayag, G
    Lartillot, O
    Bejerano, G
    COMPUTER, 2003, 36 (10) : 73 - +
  • [40] Modeling groundwater potential using novel GIS-based machine-learning ensemble techniques
    Arabameri, Alireza
    Pal, Subodh Chandra
    Rezaie, Fatemeh
    Nalivan, Omid Asadi
    Chowdhuri, Indrajit
    Saha, Asish
    Lee, Saro
    Moayedi, Hossein
    JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2021, 36