Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

被引:19
|
作者
You, Shijie [1 ]
Sun, Ye [1 ]
Wang, Xiuheng [1 ]
Ren, Nanqi [1 ]
Liu, Yanbiao [2 ]
机构
[1] Harbin Inst Technol, Sch Environm, State Key Lab Urban Water Resource & Environm, Harbin 150090, Peoples R China
[2] Donghua Univ, Coll Environm Sci & Engn, Text Pollut Controlling Engn Ctr Minist Ecol & Env, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
life cycle assessment (LCA); machine learning; data processing; feature selection; weighted Euclidean distance; FEATURE-SELECTION; NEURAL-NETWORK;
D O I
10.1021/acs.est.2c04945
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.
引用
收藏
页码:3434 / 3444
页数:11
相关论文
共 50 条
  • [31] Utilizing patient data: A tutorial on predicting second cancer with machine learning models
    Sadeghi, Hossein
    Seif, Fatemeh
    Farahani, Erfan Hatamabadi
    Khanmohammadi, Soraya
    Nahidinezhad, Shahla
    CANCER MEDICINE, 2024, 13 (18):
  • [32] From data to diagnosis: evaluation of machine learning models in predicting kidney stones
    Orlando Iparraguirre-Villanueva
    George Paucar-Palomino
    Cleoge Paulino-Moreno
    Neural Computing and Applications, 2025, 37 (15) : 9049 - 9062
  • [33] A Comparative Study of Machine Learning Models for Predicting Meteorological Data in Agricultural Applications
    Suljug, Jelena
    Spisic, Josip
    Grgic, Kresimir
    Zagar, Drago
    ELECTRONICS, 2024, 13 (16)
  • [34] Predicting depression in old age: Combining life course data with machine learning
    Montorsi, Carlotta
    Fusco, Alessio
    Van Kerm, Philippe
    Bordas, Stephane P. A.
    ECONOMICS & HUMAN BIOLOGY, 2024, 52
  • [35] Predicting pesticide dissipation half-life intervals in plants with machine learning models
    Shen, Yike
    Zhao, Ercheng
    Zhang, Wei
    Baccarelli, Andrea A.
    Gao, Feng
    JOURNAL OF HAZARDOUS MATERIALS, 2022, 436
  • [36] Machine learning framework for predicting the low cycle fatigue life of lead-free solders
    Long, Xu
    Lu, Changheng
    Su, Yutai
    Dai, Yecheng
    ENGINEERING FAILURE ANALYSIS, 2023, 148
  • [37] Predicting Odor Sensory Attributes of Unidentified Chemicals in Water Using Fragmentation Mass Spectra with Machine Learning Models
    Huang, Yuanxi
    Bu, Lingjun
    Huang, Kuan
    Zhang, Huichun
    Zhou, Shiqing
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2024, 58 (26) : 11504 - 11513
  • [38] Using Machine Learning Models and Actual Transaction Data for Predicting Real Estate Prices
    Pai, Ping-Feng
    Wang, Wen-Chang
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [39] Building life-span prediction for life cycle assessment and life cycle cost using machine learning: A big data approach
    Ji, Sukwon
    Lee, Bumho
    Yi, Mun Yong
    BUILDING AND ENVIRONMENT, 2021, 205
  • [40] Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data
    Alexander L. Perryman
    Thomas P. Stratton
    Sean Ekins
    Joel S. Freundlich
    Pharmaceutical Research, 2016, 33 : 433 - 449