Evaluating traditional versus ensemble machine learning methods for predicting missing data of daily PM10 concentration

被引：4

作者：

Kalantari, Elham ^{[1
]}

Gholami, Hamid ^{[1
]}

Malakooti, Hossein ^{[2
]}

Eftekhari, Mahdi ^{[3
]}

Saneei, Poorya ^{[3
]}

Esfandiarpour, Donya ^{[3
]}

Moosavi, Vahid ^{[4
]}

Nafarzadegan, Ali Reza ^{[1
]}

机构：

[1] Univ Hormozgan, Dept Nat Resources Engn, Bandar Abbas, Hormozgan, Iran

[2] Univ Hormozgan, Fac Marine Sci & Technol, Dept Marine & Atmospher Sci Non Biol, Bandar Abbas, Iran

[3] Shahid Bahonar Univ Kerman, Dept Comp Engn, Kerman, Iran

[4] Tarbiat Modares Univ, Dept Watershed Management Engn, Noor, Mazandaran, Iran

来源：

ATMOSPHERIC POLLUTION RESEARCH | 2024年 / 15卷 / 05期

关键词：

Machine learning; PM; 10; prediction; XGBoost; Time series; Zabol; ARTIFICIAL NEURAL-NETWORKS; AIR; INTERPOLATION; EMISSIONS;

D O I：

10.1016/j.apr.2024.102063

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

The aim of this study was to predict the missing data of PM10 for the city of Zabol using various traditional learning methods, Lazy Learning, and Ensemble Learning. In this study, daily minimum, average, and maximum data of weather variables were collected, along with daily PM10 concentration from the Zabol airport weather station during the years 2013-2022. To compare the performance of the predictive models, R2, mean absolute error (MAE), and mean squared error (MSE) criteria were used. The reconstruction results show that collective learning models, especially XGBoost, can be effectively used to predict missing PM10 data in time series. Additionally, among ensemble learning methods, boosting algorithms provide higher accuracy in predicting missing PM10 data than packing algorithms. It was also found that, according to the results, among the traditional learning methods, lazy learning models performed better than eager learning models. In order of efficiency and accuracy for predicting PM10 missing data, the models include XGBoost, random forest (RF), Extra Trees (ET), Light gradient boosting machine (GBM), The Decision Tree regressor with the Bagging method, gradient boosting (GB), Ada Boost, Weighted K-Nearest Neighbor (WKNN), K-Nearest Neighbor (KNN), The Decision Tree Regressor with the Pasting method, artificial neural network (ANN), Decision Tree (DT), and linear regression (LR). In general, given the high processing capability and potential of collective learning methods in the field of predicting missing PM10 data, this technique is considered a useful solution for saving time, energy, and costs of collecting and measuring data. It can also replace missing data in the case of any equipment malfunction or damage. This approach can also be used to predict pollutant concentrations in weather systems.

引用

页数：11

共 33 条

[21] Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods
Yazdi, Mahdieh Danesh
Kuang, Zheng
Dimakopoulou, Konstantina
Barratt, Benjamin
Suel, Esra
Amini, Heresh
Lyapustin, Alexei
Katsouyanni, Klea
Schwartz, Joel
REMOTE SENSING, 2020, 12 (06)
[22] Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Aviles (Northern Spain) using machine learning techniques
Garcia Nieto, P. J.
Sanchez Lasheras, F.
Garcia-Gonzalo, E.
de Cos Juez, F. J.
STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2018, 32 (11) : 3287 - 3298
[23] Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx, using generalized additive models and machine learning methods: A case study in London
Analitis, Antonis
Barratt, Benjamin
Green, David
Beddows, Andrew
Samoli, Evangelia
Schwartz, Joel
Katsouyanni, Klea
ATMOSPHERIC ENVIRONMENT, 2020, 240
[24] PM2.5 Concentration Prediction in Six Major Chinese Urban Agglomerations: A Comparative Study of Various Machine Learning Methods Based on Meteorological Data
Duan, Min
Sun, Yufan
Zhang, Binzhe
Chen, Chi
Tan, Tao
Zhu, Yihua
ATMOSPHERE, 2023, 14 (05)
[25] Machine learning versus traditional methods for the development of risk stratification scores: a case study using original Canadian Syncope Risk Score data
Lars Grant
Pil Joo
Marie-Joe Nemnom
Venkatesh Thiruganasambandamoorthy
Internal and Emergency Medicine, 2022, 17 : 1145 - 1153
[26] Machine learning versus traditional methods for the development of risk stratification scores: a case study using original Canadian Syncope Risk Score data
Grant, Lars
Joo, Pil
Nemnom, Marie-Joe
Thiruganasambandamoorthy, Venkatesh
INTERNAL AND EMERGENCY MEDICINE, 2022, 17 (04) : 1145 - 1153
[27] Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data
Park, Seohui
Kim, Miae
Im, Jungho
KOREAN JOURNAL OF REMOTE SENSING, 2021, 37 (02) : 321 - 335
[28] Estimation of PM2.5 Concentration across China Based on Multi-Source Remote Sensing Data and Machine Learning Methods
Yang, Yujie
Wang, Zhige
Cao, Chunxiang
Xu, Min
Yang, Xinwei
Wang, Kaimin
Guo, Heyi
Gao, Xiaotong
Li, Jingbo
Shi, Zhou
REMOTE SENSING, 2024, 16 (03)
[29] Study of the Relationship between Urban Expansion and PM10 Concentration Using Multi-Temporal Spatial Datasets and the Machine Learning Technique: Case Study for Daegu, South Korea
Choung, Yun-Jae
Kim, Jin-Man
APPLIED SCIENCES-BASEL, 2019, 9 (06):
[30] Comparison of machine learning methods for financial time series forecasting at the examples of over 10 years of daily and hourly data of DAX 30 and S&P 500
Ersan, Deniz
Nishioka, Chifumi
Scherp, Ansgar
JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2020, 3 (01): : 103 - 133

← 1 2 3 4 →