Application of the Lasso regularisation technique in mitigating overfitting in air quality prediction models

被引:0
作者
Pak, Abbas [1 ]
Rad, Abdullah Kaviani [2 ]
Nematollahi, Mohammad Javad [3 ]
Mahmoudi, Mohammadreza [4 ]
机构
[1] Shahrekord Univ, Dept Comp Sci, Shahrekord, Iran
[2] Shiraz Univ, Coll Agr, Dept Environm Engn & Nat Resources, Shiraz 7194685111, Iran
[3] Urmia Univ, Fac Sci, Dept Geol, Orumiyeh 5756151818, Iran
[4] Fasa Univ, Fac Sci, Dept Stat, Fasa 7461686131, Iran
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
Air pollution; Air quality prediction; Overfitting; Lasso regularisation; Machine learning; PM2.5; POLLUTION; OZONE; O-3; TEMPERATURE; IMPACT; NO2; CO;
D O I
10.1038/s41598-024-84342-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM2.5 and PM10), CO, NO2, SO2, and O3 while decreasing overfitting. The outputs were compared using the R-squared (R2), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model's performance in predicting gaseous pollutants against PM remained unsatisfactory (R2PM2.5 = 0.80, R2PM10 = 0.75, R2CO = 0.45, R2NO2 = 0.55, R2SO2 = 0.65, and R2O3 = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models.
引用
收藏
页数:17
相关论文
共 93 条
  • [1] Intelligent forecaster of concentrations (PM2.5, PM10, NO2, CO, O3, SO2) caused air pollution (IFCsAP)
    Al-Janabi, Samaher
    Alkaim, Ayad
    Al-Janabi, Ehab
    Aljeboree, Aseel
    Mustafa, M.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21) : 14199 - 14229
  • [2] Ashfaque JoharM., 2019, Introduction to Support Vector Machines and Kernel Methods
  • [3] A review of the inter-correlation of climate change, air pollution and urban sustainability using novel machine learning algorithms and spatial information science
    Balogun, Abdul-Lateef
    Tella, Abdulwaheed
    Baloo, Lavania
    Adebisi, Naheem
    [J]. URBAN CLIMATE, 2021, 40
  • [4] A systematic review of data mining and machine learning for air pollution epidemiology
    Bellinger, Colin
    Jabbar, Mohomed Shazan Mohomed
    Zaiane, Osmar
    Osornio-Vargas, Alvaro
    [J]. BMC PUBLIC HEALTH, 2017, 17
  • [5] A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5 concentration
    Berrocal, Veronica J.
    Guan, Yawen
    Muyskens, Amanda
    Wang, Haoyu
    Reich, Brian J.
    Mulholland, James A.
    Chang, Howard H.
    [J]. ATMOSPHERIC ENVIRONMENT, 2020, 222
  • [6] Multivariate air pollution prediction modeling with partial missingness
    Boaz, R. M.
    Lawson, A. B.
    Pearce, J. L.
    [J]. ENVIRONMETRICS, 2019, 30 (07)
  • [7] Machine Learning in Weather Prediction and Climate Analyses-Applications and Perspectives
    Bochenek, Bogdan
    Ustrnul, Zbigniew
    [J]. ATMOSPHERE, 2022, 13 (02)
  • [8] A Machine Learning Approach to Predict Air Quality in California
    Castelli, Mauro
    Clemente, Fabiana Martins
    Popovic, Ales
    Silva, Sara
    Vanneschi, Leonardo
    [J]. COMPLEXITY, 2020, 2020
  • [9] An LSTM-based aggregated model for air pollution forecasting
    Chang, Yue-Shan
    Chiao, Hsin-Ta
    Abimannan, Satheesh
    Huang, Yo-Ping
    Tsai, Yi-Ting
    Lin, Kuan-Ming
    [J]. ATMOSPHERIC POLLUTION RESEARCH, 2020, 11 (08) : 1451 - 1463
  • [10] Chen Mei-Hsin, 2023, Int J Environ Res Public Health, V20, DOI 10.3390/ijerph20054077