An integrated feature selection and machine learning framework for PM10 concentration prediction

被引:2
作者
Kalantari, Elham [1 ]
Gholami, Hamid [1 ]
Malakooti, Hossein [2 ]
Kaskaoutis, Dimitris G. [3 ,4 ]
Saneei, Poorya [5 ]
机构
[1] Univ Hormozgan, Dept Nat Resources Engn, Bandar Abbas, Hormozgan, Iran
[2] Univ Hormozgan, Fac Marine Sci & Technol, Dept Marine & Atmospher Sci Non Biol, Bandar Abbas, Iran
[3] Univ Western Macedonia, Dept Chem Engn, Kozani 50100, Greece
[4] Inst Environm Res & Sustainable Dev, Natl Observ Athens, Athens 15236, Greece
[5] Iran Univ Sci & Technol, Dept Comp Engn, Tehran, Iran
关键词
Air pollution; Feature selection; Machine learning; PM10; Dust; Zabol; DUST STORMS; PM2.5; CONCENTRATIONS; PARTICULATE MATTER; RIDGE-REGRESSION; SISTAN REGION; COMPONENT ANALYSIS; POLLUTION; MORTALITY; CANCER; IRAN;
D O I
10.1016/j.apr.2025.102456
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The Sistan Basin, east Iran is a major dust source, presenting significant atmospheric, ecological, socio-economic, and health challenges. This study employed machine learning (ML) algorithms, including Random Forest (RF), KNearest Neighbor (KNN), Weighted K-Nearest Neighbor (WKNN), Support Vector Regression (SVR), and Least Absolute Shrinkage and Selection Operator (LASSO), to model and predict PM10 concentrations in Zabol City (2013-2022), utilizing independent meteorological variables such as temperature, relative humidity, wind speed and direction. Feature selection methods - Filter (Information Gain, F-Test, Correlation Coefficient), Wrapper (Recursive Feature Elimination, Sequential Forward/Backward Selection), and Embedded (LASSO, Elastic Net, Ridge Regression, RF Importance) - were applied to identify significant predictors, with embedded methods providing the best balance of simplicity, accuracy, and cost-efficiency. Among the models, RF demonstrated the highest seasonal performance (R2 = 0.75) during summer. RF's prediction R2 values for PM10 remained above 0.5 in all seasons, consistently outperformed the other models. The WKNN model performed reasonably well across all seasons, ranking second among the models, while the LASSO model demonstrated weaker performance. The SVR model showed satisfactory performance in specific seasons, such as summer and autumn. A common feature of all models was their better performance during summer. Importantly, the models relied solely on readily available meteorological data, enabling accurate predictions of PM10 in this arid region of eastern Iran. The findings highlight the potential of ML techniques for developing air pollution prediction and warning systems, offering valuable support to policymakers in the design of effective pollution control strategies and safeguarding public health.
引用
收藏
页数:19
相关论文
共 147 条
[21]   Ridge regression ensemble for toxicity prediction [J].
Budka, Marcin ;
Gabrys, Bogdan .
ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01) :193-201
[22]   Statistical modeling approach for PM10 prediction before and during confinement by COVID-19 in South Lima, Peru [J].
Cabello-Torres, Rita Jaqueline ;
Ponce Estela, Manuel Angel ;
Sanchez-Ccoyllo, Odon ;
Alessandro Romero-Cabello, Edison ;
Garcia Avila, Fausto Fernando ;
Alberto Castaneda-Olivera, Carlos ;
Valdiviezo-Gonzales, Lorgio ;
Quispe Eulogio, Carlos Enrique ;
Huaman De la Cruz, Alex Ruben ;
Linkolk Lopez-Gonzales, Javier .
SCIENTIFIC REPORTS, 2022, 12 (01)
[23]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79
[24]   PM10 and PM2.5 real-time prediction models using an interpolated convolutional neural network [J].
Chae, Sangwon ;
Shin, Joonhyeok ;
Kwon, Sungjun ;
Lee, Sangmok ;
Kang, Sungwon ;
Lee, Donghyun .
SCIENTIFIC REPORTS, 2021, 11 (01)
[25]   FACTORS CONTROLLING THE ACIDITY OF NATURAL RAINWATER [J].
CHARLSON, RJ ;
RODHE, H .
NATURE, 1982, 295 (5851) :683-685
[26]   A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information [J].
Chen, Gongbo ;
Li, Shanshan ;
Knibbs, Luke D. ;
Hamm, N. A. S. ;
Cao, Wei ;
Li, Tiantian ;
Guo, Jianping ;
Ren, Hongyan ;
Abramson, Michael J. ;
Guo, Yuming .
SCIENCE OF THE TOTAL ENVIRONMENT, 2018, 636 :52-60
[27]   Impacts of Fuel Stage Ratio on the Morphological and Nanostructural Characteristics of Soot Emissions from a Twin Annular Premixing Swirler Combustor [J].
Chen, Longfei ;
Cui, Boxuan ;
Zhang, Chenglin ;
Hu, Xuehuan ;
Wang, Yingying ;
Li, Guangze ;
Chang, Liuyong ;
Liu, Lei .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2024, 58 (24) :10558-10566
[28]   Humidity plays an important role in the PM2.5 pollution in Beijing [J].
Cheng, Yuan ;
He, Ke-bin ;
Du, Zhen-yu ;
Zheng, Mei ;
Duan, Feng-kui ;
Ma, Yong-liang .
ENVIRONMENTAL POLLUTION, 2015, 197 :68-75
[29]   Satellite-Based Aerosol Classification for Capital Cities in Asia Using a Random Forest Model [J].
Choi, Wonei ;
Kang, Hyeongwoo ;
Shin, Dongho ;
Lee, Hanlim .
REMOTE SENSING, 2021, 13 (13)
[30]   Spatial hazard assessment of the PM10 using machine learning models in Barcelona, Spain [J].
Choubin, Bahram ;
Abdolshahnejad, Mahsa ;
Moradi, Ehsan ;
Querol, Xavier ;
Mosavi, Amir ;
Shamshirband, Shahaboddin ;
Ghamisi, Pedram .
SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 701