Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

被引:19
|
作者
You, Shijie [1 ]
Sun, Ye [1 ]
Wang, Xiuheng [1 ]
Ren, Nanqi [1 ]
Liu, Yanbiao [2 ]
机构
[1] Harbin Inst Technol, Sch Environm, State Key Lab Urban Water Resource & Environm, Harbin 150090, Peoples R China
[2] Donghua Univ, Coll Environm Sci & Engn, Text Pollut Controlling Engn Ctr Minist Ecol & Env, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
life cycle assessment (LCA); machine learning; data processing; feature selection; weighted Euclidean distance; FEATURE-SELECTION; NEURAL-NETWORK;
D O I
10.1021/acs.est.2c04945
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.
引用
收藏
页码:3434 / 3444
页数:11
相关论文
共 50 条
  • [41] Data Mining and Machine Learning Models for Predicting Drug Likeness and Their Disease or Organ Category
    Yosipof, Abraham
    Guedes, Rita C.
    Garcia-Sosa, Alfonso T.
    FRONTIERS IN CHEMISTRY, 2018, 6
  • [42] Predicting the Cycle Life of Lithium-Ion Batteries Using Data-Driven Machine Learning Based on Discharge Voltage Curves
    Jiang, Yinfeng
    Song, Wenxiang
    BATTERIES-BASEL, 2023, 9 (08):
  • [43] Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data
    Perryman, Alexander L.
    Stratton, Thomas P.
    Ekins, Sean
    Freundlich, Joel S.
    PHARMACEUTICAL RESEARCH, 2016, 33 (02) : 433 - 449
  • [44] MACHINE LEARNING MODELS FOR PREDICTING CUTTINGS CONCENTRATION IN ANNULUS BASED ON FLOWLOOP EXPERIMENTAL DATA
    Purwandari, Sartika D.
    Lund, Bjornar
    Hovda, Sigve
    PROCEEDINGS OF ASME 2023 42ND INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE & ARCTIC ENGINEERING, OMAE2023, VOL 9, 2023,
  • [45] System-Level Approach for Identifying Main Uncertainty Sources in Pavement Construction Life-Cycle Assessment for Quantifying Environmental Impacts
    Yoo, Wonjae
    Ozer, Hasan
    Ham, Youngjib
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2019, 145 (02)
  • [46] Predicting tool life and sound pressure levels in dry turning using machine learning models
    de Souza, Alex Fernandes
    Verri, Filipe Alves Neto
    Campos, Paulo Henrique da Silva
    Balestrassi, Pedro Paulo
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2024, 135 (7-8) : 3777 - 3793
  • [47] Joint Life Cycle Assessment and Data Envelopment Analysis for the benchmarking of environmental impacts in rice paddy production
    Mohammadi, Ali
    Rafiee, Shahin
    Jafari, Ali
    Keyhani, Alireza
    Dalgaard, Tommy
    Knudsen, Marie Trydeman
    Nguyen, Thu Lan T.
    Borek, Robert
    Hermansen, John E.
    JOURNAL OF CLEANER PRODUCTION, 2015, 106 : 521 - 532
  • [48] Improved workflow for constructing machine learning models: Predicting retention times and peak widths in oligonucleotide separation
    Samuelsson, Jorgen
    Enmark, Martin
    Szabados, Gergely
    Rahal, Manal
    Ahmed, Bestoun S.
    Haggstrom, Jakob
    Forssen, Patrik
    Fornstedt, Torgny
    JOURNAL OF CHROMATOGRAPHY A, 2025, 1747
  • [49] Predicting edge cracking in sheet metal forming: evaluating machine learning models and data transformations
    José Contente
    Pedro Prates
    The International Journal of Advanced Manufacturing Technology, 2025, 138 (7) : 3089 - 3107
  • [50] Predicting Extraction Selectivity of Acetic Acid in Pervaporation by Machine Learning Models with Data Leakage Management
    Yang, Meiqi
    Zhu, Jun-Jie
    McGaughey, Allyson
    Zheng, Sunxiang
    Priestley, Rodney D.
    Ren, Zhiyong Jason
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2023, 57 (14) : 5934 - 5946