An integrated feature selection and machine learning framework for PM10 concentration prediction

被引:0
作者
Kalantari, Elham [1 ]
Gholami, Hamid [1 ]
Malakooti, Hossein [2 ]
Kaskaoutis, Dimitris G. [3 ,4 ]
Saneei, Poorya [5 ]
机构
[1] Univ Hormozgan, Dept Nat Resources Engn, Bandar Abbas, Hormozgan, Iran
[2] Univ Hormozgan, Fac Marine Sci & Technol, Dept Marine & Atmospher Sci Non Biol, Bandar Abbas, Iran
[3] Univ Western Macedonia, Dept Chem Engn, Kozani 50100, Greece
[4] Inst Environm Res & Sustainable Dev, Natl Observ Athens, Athens 15236, Greece
[5] Iran Univ Sci & Technol, Dept Comp Engn, Tehran, Iran
关键词
Air pollution; Feature selection; Machine learning; PM10; Dust; Zabol; DUST STORMS; PM2.5; CONCENTRATIONS; PARTICULATE MATTER; RIDGE-REGRESSION; SISTAN REGION; COMPONENT ANALYSIS; POLLUTION; MORTALITY; CANCER; IRAN;
D O I
10.1016/j.apr.2025.102456
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The Sistan Basin, east Iran is a major dust source, presenting significant atmospheric, ecological, socio-economic, and health challenges. This study employed machine learning (ML) algorithms, including Random Forest (RF), KNearest Neighbor (KNN), Weighted K-Nearest Neighbor (WKNN), Support Vector Regression (SVR), and Least Absolute Shrinkage and Selection Operator (LASSO), to model and predict PM10 concentrations in Zabol City (2013-2022), utilizing independent meteorological variables such as temperature, relative humidity, wind speed and direction. Feature selection methods - Filter (Information Gain, F-Test, Correlation Coefficient), Wrapper (Recursive Feature Elimination, Sequential Forward/Backward Selection), and Embedded (LASSO, Elastic Net, Ridge Regression, RF Importance) - were applied to identify significant predictors, with embedded methods providing the best balance of simplicity, accuracy, and cost-efficiency. Among the models, RF demonstrated the highest seasonal performance (R2 = 0.75) during summer. RF's prediction R2 values for PM10 remained above 0.5 in all seasons, consistently outperformed the other models. The WKNN model performed reasonably well across all seasons, ranking second among the models, while the LASSO model demonstrated weaker performance. The SVR model showed satisfactory performance in specific seasons, such as summer and autumn. A common feature of all models was their better performance during summer. Importantly, the models relied solely on readily available meteorological data, enabling accurate predictions of PM10 in this arid region of eastern Iran. The findings highlight the potential of ML techniques for developing air pollution prediction and warning systems, offering valuable support to policymakers in the design of effective pollution control strategies and safeguarding public health.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Machine Learning Techniques for PM10 Levels Forecast in Bogota
    Mejia Martinez, Nicolas
    Melissa Montes, Laura
    Mura, Ivan
    Felipe Franco, Juan
    2018 ICAI WORKSHOPS (ICAIW), 2018,
  • [42] PM10 Concentration Forecast Based on Wavelet Support Vector Machine
    Li, Yong
    Tao, Yan
    2017 INTERNATIONAL CONFERENCE ON SENSING, DIAGNOSTICS, PROGNOSTICS, AND CONTROL (SDPC), 2017, : 383 - 386
  • [43] Prediction of short and medium term PM10 concentration using artificial neural networks
    Schornobay-Lui, Elaine
    Alexandrina, Eduardo Carlos
    Aguiar, Monica Lopes
    Hanisch, Werner Siegfried
    Correa, Edinalda Moreira
    Correa, Nivaldo Aparecido
    MANAGEMENT OF ENVIRONMENTAL QUALITY, 2019, 30 (02) : 414 - 436
  • [44] Prediction of air pollution and analysis of its effects on the pollution dispersion of PM10 in Egypt using machine learning algorithms
    Hanna, Wael K.
    Elstohy, Rasha
    Radwan, Nouran M.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2022, 14 (04) : 358 - 371
  • [45] Performance of Bayesian Model Averaging (BMA) for Short-Term Prediction of PM10 Concentration in the Peninsular Malaysia
    Ramli, Norazrin
    Hamid, Hazrul Abdul
    Yahaya, Ahmad Shukri
    Ul-Saufie, Ahmad Zia
    Noor, Norazian Mohamed
    Abu Seman, Nor Amirah
    Kamarudzaman, Ain Nihla
    Deak, Gyoergy
    ATMOSPHERE, 2023, 14 (02)
  • [46] Monitoring and analysis of PM10 concentration at Delhi Metro construction sites
    Mishra, Rajeev Kumar
    Joshi, Tarun
    Goel, Nikhil
    Gupta, Himanshu
    Kumar, Amrit
    INTERNATIONAL JOURNAL OF ENVIRONMENT AND POLLUTION, 2015, 57 (1-2) : 27 - 37
  • [47] Determination of atmospheric PM10 concentration in Kandy in relation to traffic intensity
    Elangasinghe, M. A.
    Shanthini, R.
    JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2008, 36 (03): : 245 - 249
  • [48] Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
    Whitmire, Christopher D. D.
    Vance, Jonathan M. M.
    Rasheed, Hend K. K.
    Missaoui, Ali
    Rasheed, Khaled M. M.
    Maier, Frederick W. W.
    AI, 2021, 2 (01) : 71 - 88
  • [49] Comparison of four machine learning methods for predicting PM10 concentrations in Helsinki, Finland
    Zickus, M
    Greig, AJ
    Niranjan, M
    URBAN AIR QUALITY - RECENT ADVANCES, PROCEEDINGS, 2002, : 717 - 729
  • [50] An integrated approach to identify the origin of PM10 exceedances
    Amodio, M.
    Andriani, E.
    de Gennaro, G.
    Loiotile, A. Demarinis
    Di Gilio, A.
    Placentino, M. C.
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2012, 19 (08) : 3132 - 3141