An integrated feature selection and machine learning framework for PM10 concentration prediction

被引:2
作者
Kalantari, Elham [1 ]
Gholami, Hamid [1 ]
Malakooti, Hossein [2 ]
Kaskaoutis, Dimitris G. [3 ,4 ]
Saneei, Poorya [5 ]
机构
[1] Univ Hormozgan, Dept Nat Resources Engn, Bandar Abbas, Hormozgan, Iran
[2] Univ Hormozgan, Fac Marine Sci & Technol, Dept Marine & Atmospher Sci Non Biol, Bandar Abbas, Iran
[3] Univ Western Macedonia, Dept Chem Engn, Kozani 50100, Greece
[4] Inst Environm Res & Sustainable Dev, Natl Observ Athens, Athens 15236, Greece
[5] Iran Univ Sci & Technol, Dept Comp Engn, Tehran, Iran
关键词
Air pollution; Feature selection; Machine learning; PM10; Dust; Zabol; DUST STORMS; PM2.5; CONCENTRATIONS; PARTICULATE MATTER; RIDGE-REGRESSION; SISTAN REGION; COMPONENT ANALYSIS; POLLUTION; MORTALITY; CANCER; IRAN;
D O I
10.1016/j.apr.2025.102456
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The Sistan Basin, east Iran is a major dust source, presenting significant atmospheric, ecological, socio-economic, and health challenges. This study employed machine learning (ML) algorithms, including Random Forest (RF), KNearest Neighbor (KNN), Weighted K-Nearest Neighbor (WKNN), Support Vector Regression (SVR), and Least Absolute Shrinkage and Selection Operator (LASSO), to model and predict PM10 concentrations in Zabol City (2013-2022), utilizing independent meteorological variables such as temperature, relative humidity, wind speed and direction. Feature selection methods - Filter (Information Gain, F-Test, Correlation Coefficient), Wrapper (Recursive Feature Elimination, Sequential Forward/Backward Selection), and Embedded (LASSO, Elastic Net, Ridge Regression, RF Importance) - were applied to identify significant predictors, with embedded methods providing the best balance of simplicity, accuracy, and cost-efficiency. Among the models, RF demonstrated the highest seasonal performance (R2 = 0.75) during summer. RF's prediction R2 values for PM10 remained above 0.5 in all seasons, consistently outperformed the other models. The WKNN model performed reasonably well across all seasons, ranking second among the models, while the LASSO model demonstrated weaker performance. The SVR model showed satisfactory performance in specific seasons, such as summer and autumn. A common feature of all models was their better performance during summer. Importantly, the models relied solely on readily available meteorological data, enabling accurate predictions of PM10 in this arid region of eastern Iran. The findings highlight the potential of ML techniques for developing air pollution prediction and warning systems, offering valuable support to policymakers in the design of effective pollution control strategies and safeguarding public health.
引用
收藏
页数:19
相关论文
共 147 条
[1]  
Aalii Mahmodi Sarab S., 2018, Natural Environment and Natural Resources of Iran, V71, P385
[2]   PM10 Pollution: Its Prediction and Meteorological Influence in PasirGudang, Johor [J].
Afzali, A. ;
Rashid, M. ;
Sabariah, B. ;
Ramli, M. .
8TH INTERNATIONAL SYMPOSIUM OF THE DIGITAL EARTH (ISDE8), 2014, 18
[3]   Evaluation of data preprocessing and feature selection process for prediction of hourly PM10 concentration using long short-term memory models [J].
Aksangur, Ipek ;
Eren, Beytullah ;
Erden, Caner .
ENVIRONMENTAL POLLUTION, 2022, 311
[4]   Health Impact Assessment Associated with Exposure to PM10 and Dust Storms in Kuwait [J].
Al-Hemoud, Ali ;
Al-Dousari, Ali ;
Al-Shatti, Ahmad ;
Al-Khayat, Ahmed ;
Behbehani, Weam ;
Malak, Mariam .
ATMOSPHERE, 2018, 9 (01)
[5]   Application of global dust detection index (GDDI) for sand and dust storm monitoring over Kingdom of Saudi Arabia [J].
Alghamdi, Essam Mohammed ;
Assiri, Mazen Ebraheem ;
Butt, Mohsin Jamil .
NATURAL HAZARDS, 2024, 120 (14) :13385-13405
[6]   The "wind of 120 days" and dust storm activity over the Sistan Basin [J].
Alizadeh-Choobari, O. ;
Zawar-Reza, P. ;
Sturman, A. .
ATMOSPHERIC RESEARCH, 2014, 143 :328-341
[7]   Evaluation of Key Parameters Using Deep Convolutional Neural Networks for Airborne Pollution (PM10) Prediction [J].
Antonio Aceves-Fernandez, Marco ;
Dominguez-Guevara, Ricardo ;
Carlos Pedraza-Ortega, Jesus ;
Emilio Vargas-Soto, Jose .
DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2020, 2020
[8]  
Anvari S, 2023, IRAN J FUZZY SYST, V20, P61
[9]  
Balogun H., 2021, Random Forest Feature Selection for Particulate Matter (PM10) Pollution Concentration, P576
[10]   Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities [J].
Banga, Alisha ;
Ahuja, Ravinder ;
Sharma, Subhash Chander .
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (SUPPL 3) :732-745