Predictive Modeling of Pesticides Reproductive Toxicity in Earthworms Using Interpretable Machine-Learning Techniques on Imbalanced Data

被引:0
|
作者
Kotli, Mihkel [1 ]
Piir, Geven [1 ]
Maran, Uko [1 ]
机构
[1] Univ Tartu, Inst Chem, EE-50411 Tartu, Estonia
来源
ACS OMEGA | 2025年 / 10卷 / 05期
基金
欧盟地平线“2020”;
关键词
QSAR MODELS; CHEMICALS; SORPTION;
D O I
10.1021/acsomega.4c09719
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The earthworm is a key indicator species in soil ecosystems. This makes the reproductive toxicity of chemical compounds to earthworms a desired property of determination and makes computational models necessary for descriptive and predictive purposes. Thus, the aim was to develop an advanced Quantitative Structure-Activity Relationship modeling approach for this complex property with imbalanced data. The approach integrated gradient-boosted decision trees as classifiers with a genetic algorithm for feature selection and Bayesian optimization for hyperparameter tuning. An additional goal was to analyze and interpret, using SHAP values, the structural features encoded by the molecular descriptors that contribute to pesticide toxicity and nontoxicity, the most notable of which are solvation entropy and a number of hydrolyzable bonds. The final model was constructed as a stacked ensemble of models and combined the strengths of the individual models. Evaluation of this model with an external test set of 147 compounds demonstrated a well-defined applicability domain and sufficient predictive capabilities with a Balanced Accuracy of 77%. The model representation follows FAIR principles and is available on QsarDB.org.
引用
收藏
页码:4732 / 4744
页数:13
相关论文
共 50 条
  • [41] Medical Data Assessment with Traditional, Machine-learning and Deep-learning Techniques
    Lin, Hong
    Satapathy, Suresh Chandra
    Rajinikanth, V.
    CURRENT MEDICAL IMAGING, 2020, 16 (10) : 1185 - 1186
  • [42] Modeling Static Liquefaction Susceptibility of Saturated Clayey Sand using Advanced Machine-Learning techniques
    Alioua, Sonia
    Arab, Ahmed
    Benbouras, Mohammed Amin
    Leghouchi, Abdelghani
    TRANSPORTATION INFRASTRUCTURE GEOTECHNOLOGY, 2024, 11 (05) : 2903 - 2931
  • [43] Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
    Azmi, Putri Azmira R.
    Yusoff, Marina
    Sallehud-din, Mohamad Taufik Mohd
    ENERGY REPORTS, 2025, 13 : 264 - 277
  • [44] Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning
    Rikke Linnemann Nielsen
    Thomas Monfeuga
    Robert R. Kitchen
    Line Egerod
    Luis G. Leal
    August Thomas Hjortshøj Schreyer
    Frederik Steensgaard Gade
    Carol Sun
    Marianne Helenius
    Lotte Simonsen
    Marianne Willert
    Abd A. Tahrani
    Zahra McVey
    Ramneek Gupta
    Nature Communications, 15
  • [45] Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning
    Nielsen, Rikke Linnemann
    Monfeuga, Thomas
    Kitchen, Robert R.
    Egerod, Line
    Leal, Luis G.
    Schreyer, August Thomas Hjortshoj
    Gade, Frederik Steensgaard
    Sun, Carol
    Helenius, Marianne
    Simonsen, Lotte
    Willert, Marianne
    Tahrani, Abd A.
    McVey, Zahra
    Gupta, Ramneek
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [46] Interactive target recognition in images using machine-learning techniques
    Michaeli, Ariel
    Camon, Irit
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XX, 2011, 8050
  • [47] Stress prediction using machine-learning techniques on physiological signals
    Tu Thanh Do
    Luan Van Tran
    Tho Anh Le
    Thao Mai Thi Le
    Lan-Anh Hoang Duong
    Thuong Hoai Nguyen
    Duy The Phan
    Toi Van Vo
    Huong Thanh Thi Ha
    2023 1ST INTERNATIONAL CONFERENCE ON HEALTH SCIENCE AND TECHNOLOGY, ICHST 2023, 2023,
  • [48] MACHINE-LEARNING TECHNIQUES IN MULTIPLE SCLEROSIS PREDICTION USING EEG
    Soleimanidoust, Leila
    Rezai, Abdalhossein
    Barghamadi, Hamideh
    Ahanian, Iman
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2024,
  • [49] Characterizing EMG data using machine-learning tools
    Yousefi, Jamileh
    Hamilton-Wright, Andrew
    COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 51 : 1 - 13
  • [50] Identifying Reliable Predictors of Educational Outcomes Through Machine-Learning Predictive Modeling
    Musso, Mariel F.
    Cascallar, Eduardo C.
    Bostani, Neda
    Crawford, Michael
    FRONTIERS IN EDUCATION, 2020, 5