Development of models predicting biodegradation rate rating with multiple linear regression and support vector machine algorithms

被引:45
|
作者
Tang, Weihao [1 ]
Li, Yanying [1 ]
Yu, Yang [2 ]
Wang, Zhongyu [1 ]
Xu, Tong [1 ]
Chen, Jingwen [1 ]
Lin, Jun [2 ]
Li, Xuehua [1 ]
机构
[1] Dalian Univ Technol, Sch Environm Sci & Technol, Key Lab Ind Ecol & Environm Engn MOE, Dalian 116024, Peoples R China
[2] Minist Ecol & Environm MEE, Solid Waste & Chem Management Ctr, Beijing 100029, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Biodegradability; Quantitative structure-activity relationship; Multiple linear regression; Support vector machine; Molecular structure descriptors; AEROBIC BIODEGRADATION; READY BIODEGRADABILITY; BIOACCUMULATIVE ORGANICS; CHEMICALS; PERSISTENT; QSAR; POLLUTANTS;
D O I
10.1016/j.chemosphere.2020.126666
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Biodegradation is a significant process for removing organic chemicals from water, soil and sediment environments, and therefore biodegradability is critical to evaluate the environmental persistence of organic chemicals. In this study, based on a dataset with 171 compounds, four quantitative structure-activity relationship (QSAR) models were developed for predicting primary and ultimate biodegradation rate rating with multiple linear regression (MLR) and support vector machine (SVM) algorithms. Two MLR models were built with a dataset with carbon atom number <= 9, and two SVM models were built with a dataset with carbon atom number >9. In the MLR models, n(ArX) (number of X on aromatic ring) is the most important descriptor governing primary and ultimate biodegradation of organic chemicals. For the SVM models, determination coefficient (R-2) values, cross-validated coefficients (Q(LOO)(2)) and external validation coefficient (Q(ext)(2)) values are over 0.9, indicating the SVM models have satisfactory goodness-of-fit, robustness and external predictive abilities. The applicability domains of these models were visualized by the Williams plot. The developed models can be used as effective tools to predict biodegradability of organic chemicals. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Predicting corporate financial distress based on integration of support vector machine and logistic regression
    Hua, Zhongsheng
    Wang, Yu
    Xu, Xiaoyan
    Zhang, Bin
    Liang, Liang
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (02) : 434 - 440
  • [22] Classification and Regression Machine Learning Models for Predicting Aerobic Ready and Inherent Biodegradation of Organic Chemicals in Water
    Huang, Kuan
    Zhang, Huichun
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2022, 56 (17) : 12755 - 12764
  • [23] Interpreting linear support vector machine models with heat map molecule coloring
    Lars Rosenbaum
    Georg Hinselmann
    Andreas Jahn
    Andreas Zell
    Journal of Cheminformatics, 3
  • [24] Granular Multiple Birth Support Vector Machine based on Weighted Linear Loss
    Ding, Shifei
    Zhang, Xiekai
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2453 - 2459
  • [25] Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis
    Philip G. Oguntunde
    Gunnar Lischeid
    Ottfried Dietrich
    International Journal of Biometeorology, 2018, 62 : 459 - 469
  • [26] Comparison of random forest and support vector machine regression models for forecasting road accidents
    Gatera, Antoine
    Kuradusenge, Martin
    Bajpai, Gaurav
    Mikeka, Chomora
    Shrivastava, Sarika
    SCIENTIFIC AFRICAN, 2023, 21
  • [27] Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis
    Oguntunde, Philip G.
    Lischeid, Gunnar
    Dietrich, Ottfried
    INTERNATIONAL JOURNAL OF BIOMETEOROLOGY, 2018, 62 (03) : 459 - 469
  • [28] Predicting CNS permeability of drug molecules: comparison of neural network and support vector machine algorithms
    Doniger, S
    Hofmann, T
    Yeh, J
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (06) : 849 - 864
  • [29] QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression
    Darnag, Rachid
    Minaoui, Brahim
    Fakir, Mohamed
    ARABIAN JOURNAL OF CHEMISTRY, 2017, 10 : S600 - S608
  • [30] LogP Prediction for Blocked Tripeptides with Amino Acids Descriptors (HMLP) by Multiple Linear Regression and Support Vector Regression
    Yin, Jiajian
    2011 INTERNATIONAL CONFERENCE ON ENVIRONMENT SCIENCE AND BIOTECHNOLOGY (ICESB 2011), 2011, 8 : 173 - 178