Development of models predicting biodegradation rate rating with multiple linear regression and support vector machine algorithms

被引:45
|
作者
Tang, Weihao [1 ]
Li, Yanying [1 ]
Yu, Yang [2 ]
Wang, Zhongyu [1 ]
Xu, Tong [1 ]
Chen, Jingwen [1 ]
Lin, Jun [2 ]
Li, Xuehua [1 ]
机构
[1] Dalian Univ Technol, Sch Environm Sci & Technol, Key Lab Ind Ecol & Environm Engn MOE, Dalian 116024, Peoples R China
[2] Minist Ecol & Environm MEE, Solid Waste & Chem Management Ctr, Beijing 100029, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Biodegradability; Quantitative structure-activity relationship; Multiple linear regression; Support vector machine; Molecular structure descriptors; AEROBIC BIODEGRADATION; READY BIODEGRADABILITY; BIOACCUMULATIVE ORGANICS; CHEMICALS; PERSISTENT; QSAR; POLLUTANTS;
D O I
10.1016/j.chemosphere.2020.126666
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Biodegradation is a significant process for removing organic chemicals from water, soil and sediment environments, and therefore biodegradability is critical to evaluate the environmental persistence of organic chemicals. In this study, based on a dataset with 171 compounds, four quantitative structure-activity relationship (QSAR) models were developed for predicting primary and ultimate biodegradation rate rating with multiple linear regression (MLR) and support vector machine (SVM) algorithms. Two MLR models were built with a dataset with carbon atom number <= 9, and two SVM models were built with a dataset with carbon atom number >9. In the MLR models, n(ArX) (number of X on aromatic ring) is the most important descriptor governing primary and ultimate biodegradation of organic chemicals. For the SVM models, determination coefficient (R-2) values, cross-validated coefficients (Q(LOO)(2)) and external validation coefficient (Q(ext)(2)) values are over 0.9, indicating the SVM models have satisfactory goodness-of-fit, robustness and external predictive abilities. The applicability domains of these models were visualized by the Williams plot. The developed models can be used as effective tools to predict biodegradability of organic chemicals. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Predicting SSRI-Resistance: Clinical Features and tagSNPs Prediction Models Based on Support Vector Machine
    Zhang, Huijie
    Li, Xianglu
    Pang, Jianyue
    Zhao, Xiaofeng
    Cao, Suxia
    Wang, Xinyou
    Wang, Xingbang
    Li, Hengfen
    FRONTIERS IN PSYCHIATRY, 2020, 11
  • [42] Predicting business failure using multiple case-based reasoning combined with support vector machine
    Li, Hui
    Sun, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (06) : 10085 - 10096
  • [43] Annual Electricity and Energy Consumption Forecasting for the UK Based on Back Propagation Neural Network, Multiple Linear Regression, and Least Square Support Vector Machine
    Liu, Yinlong
    Li, Jinze
    PROCESSES, 2023, 11 (01)
  • [44] Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine
    Ebrahimi, Hadi
    Rajaee, Taher
    GLOBAL AND PLANETARY CHANGE, 2017, 148 : 181 - 191
  • [45] In Silico Log P Prediction for a Large Data Set with Support Vector Machines, Radial Basis Neural Networks and Multiple Linear Regression
    Chen, Hai-Feng
    CHEMICAL BIOLOGY & DRUG DESIGN, 2009, 74 (02) : 142 - 147
  • [46] Estimation of Heating Load Consumption in Residual Buildings using Optimized Regression Models Based on Support Vector Machine
    Wang, Chao
    Qiu, Xuehui
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 1019 - 1030
  • [47] Modeling High Pan Evaporation Losses Using Support Vector Machine, Gaussian Processes, and Regression Tree Models
    Alsumaiei, Abdullah A.
    JOURNAL OF HYDROLOGIC ENGINEERING, 2024, 29 (05)
  • [48] Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors
    Agyapong, Odame
    Miller, Whelton A.
    Wilson, Michael D.
    Kwofie, Samuel K.
    MOLECULAR DIVERSITY, 2022, 26 (04) : 2231 - 2242
  • [49] Development of a proteochemometric-based support vector machine model for predicting bioactive molecules of tubulin receptors
    Odame Agyapong
    Whelton A. Miller
    Michael D. Wilson
    Samuel K. Kwofie
    Molecular Diversity, 2022, 26 : 2231 - 2242
  • [50] Prediction Models for Late-Onset Preeclampsia: A Study Based on Logistic Regression, Support Vector Machine, and Extreme Gradient Boosting Models
    Zhang, Yangyang
    Gu, Xunke
    Yang, Nan
    Xue, Yuting
    Ma, Lijuan
    Wang, Yongqing
    Zhang, Hua
    Jia, Keke
    BIOMEDICINES, 2025, 13 (02)