Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

被引:19
|
作者
You, Shijie [1 ]
Sun, Ye [1 ]
Wang, Xiuheng [1 ]
Ren, Nanqi [1 ]
Liu, Yanbiao [2 ]
机构
[1] Harbin Inst Technol, Sch Environm, State Key Lab Urban Water Resource & Environm, Harbin 150090, Peoples R China
[2] Donghua Univ, Coll Environm Sci & Engn, Text Pollut Controlling Engn Ctr Minist Ecol & Env, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
life cycle assessment (LCA); machine learning; data processing; feature selection; weighted Euclidean distance; FEATURE-SELECTION; NEURAL-NETWORK;
D O I
10.1021/acs.est.2c04945
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.
引用
收藏
页码:3434 / 3444
页数:11
相关论文
共 50 条
  • [21] Comparing Machine Learning Approaches for Predicting Spatially Explicit Life Cycle Global Warming and Eutrophication Impacts from Corn Production
    Romeiko, Xiaobo Xue
    Guo, Zhijian
    Pang, Yulei
    Lee, Eun Kyung
    Zhang, Xuesong
    SUSTAINABILITY, 2020, 12 (04)
  • [22] Error analysis for approximate structural life-cycle reliability and risk using machine learning methods
    Yang, David Y.
    Frangopol, Dan M.
    Han, Xu
    STRUCTURAL SAFETY, 2021, 89
  • [23] Life Cycle Information Models: Parameterized Linked Data Structures to Facilitate the Consistent Use of Life-Cycle Assessment in Decision Making
    Bhat, Chaitanya Ganesh
    Mukherjee, Amlan
    Meijer, Joep P. R.
    JOURNAL OF TRANSPORTATION ENGINEERING PART B-PAVEMENTS, 2021, 147 (04)
  • [24] Prediction of Battery Cycle Life Using Early-Cycle Data, Machine Learning and Data Management
    Celik, Belen
    Sandt, Roland
    dos Santos, Lara Caroline Pereira
    Spatschek, Robert
    BATTERIES-BASEL, 2022, 8 (12):
  • [25] Application of Machine Learning Paradigms for Predicting Quality in Upstream Software Development Life Cycle
    Piyush Mehta
    A. Srividya
    A. K. Verma
    OPSEARCH, 2005, 42 (4) : 332 - 339
  • [26] An integration of machine learning models and life cycle assessment for lignocellulosic bioethanol platforms
    Long, Fei
    Liu, Hong
    ENERGY CONVERSION AND MANAGEMENT, 2023, 292
  • [27] Predictive models in machine learning for strength and life cycle assessment of concrete structures
    Dinesh, A.
    Prasad, Rahul
    AUTOMATION IN CONSTRUCTION, 2024, 162
  • [28] Life cycle environmental impacts of kelp aquaculture through harmonized recalculation of inventory data
    Thomas, Jean-Baptiste E.
    Ahlgren, Ellen
    Hornborg, Sara
    Ziegler, Friederike
    JOURNAL OF CLEANER PRODUCTION, 2024, 450
  • [29] Applying Machine Learning Models on Metrology Data for Predicting Device Electrical Performance
    Dey, Bappaditya
    Anh Tuan Ngo
    Sacchi, Sara
    Blanco, Victor
    Leray, Philippe
    Halder, Sandip
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT IV, 2025, 2136 : 435 - 453
  • [30] Predicting severely imbalanced data disk drive failures with machine learning models
    Ahmed, Jishan
    Green II, Robert C.
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9