Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

被引:19
|
作者
You, Shijie [1 ]
Sun, Ye [1 ]
Wang, Xiuheng [1 ]
Ren, Nanqi [1 ]
Liu, Yanbiao [2 ]
机构
[1] Harbin Inst Technol, Sch Environm, State Key Lab Urban Water Resource & Environm, Harbin 150090, Peoples R China
[2] Donghua Univ, Coll Environm Sci & Engn, Text Pollut Controlling Engn Ctr Minist Ecol & Env, Shanghai 201620, Peoples R China
基金
中国国家自然科学基金;
关键词
life cycle assessment (LCA); machine learning; data processing; feature selection; weighted Euclidean distance; FEATURE-SELECTION; NEURAL-NETWORK;
D O I
10.1021/acs.est.2c04945
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.
引用
收藏
页码:3434 / 3444
页数:11
相关论文
共 50 条
  • [1] Predicting environmental impacts of smallholder wheat production by coupling life cycle assessment and machine learning
    Yu, Chunxiao
    Xu, Gang
    Cai, Ming
    Li, Yuan
    Wang, Lijia
    Zhang, Yan
    Lin, Huilong
    SCIENCE OF THE TOTAL ENVIRONMENT, 2024, 921
  • [2] A framework of developing machine learning models for facility life-cycle cost analysis
    Gao, Xinghua
    Pishdad-Bozorgi, Pardis
    BUILDING RESEARCH AND INFORMATION, 2020, 48 (05) : 501 - 525
  • [3] Machine learning models for predicting endocrine disruption potential of environmental chemicals
    Chierici, Marco
    Giulini, Marco
    Bussola, Nicole
    Jurman, Giuseppe
    Furlanello, Cesare
    JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH PART C-ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS, 2018, 36 (04) : 237 - 251
  • [4] Predicting product life cycle environmental impacts with machine learning: Uncertainties and implications for future reporting requirements
    Baehr, Julian
    Koyamparambath, Anish
    Dos Reis, Eduardo
    Weyand, Steffi
    Binnig, Carsten
    Schebek, Liselotte
    Sonnemann, Guido
    Sustainable Production and Consumption, 2024, 52 : 511 - 526
  • [5] Machine Learning Applications in Facility Life-Cycle Cost Analysis: A Review
    Gao, Xinghua
    Pishdad-Bozorgi, Pardis
    Shelden, Dennis R.
    Hu, Yuqing
    COMPUTING IN CIVIL ENGINEERING 2019: SMART CITIES, SUSTAINABILITY, AND RESILIENCE, 2019, : 267 - 274
  • [6] Airfield pavement condition prediction with machine learning models for life-cycle cost analysis
    Clemmensen, April
    Wang, Hao
    INTERNATIONAL JOURNAL OF PAVEMENT ENGINEERING, 2024, 25 (01)
  • [7] Projecting life-cycle environmental impacts of corn production in the US Midwest under future climate scenarios using a machine learning approach
    Lee, Eun Kyung
    Zhang, Wang-Jian
    Zhang, Xuesong
    Adler, Paul R.
    Lin, Shao
    Feingold, Beth J.
    Khwaja, Haider A.
    Romeiko, Xiaobo X.
    SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 714
  • [8] Estimate ecotoxicity characterization factors for chemicals in life cycle assessment using machine learning models
    Hou, Ping
    Jolliet, Olivier
    Zhu, Ji
    Xu, Ming
    ENVIRONMENT INTERNATIONAL, 2020, 135
  • [9] Assessing the life-cycle environmental impacts of the wood pallet sector in the United States
    Alanya-Rosenbaum, S.
    Bergman, R. D.
    Gething, B.
    JOURNAL OF CLEANER PRODUCTION, 2021, 320
  • [10] Examining the life-cycle environmental impacts of desalination: A case study in the State of Qatar
    Mannan, Mehzabeen
    Alhaj, Mohamed
    Mabrouk, Abdel Nasser
    Al-Ghamdi, Sami G.
    DESALINATION, 2019, 452 : 238 - 246