Air pollution prediction with machine learning: a case study of Indian cities

被引:86
作者
Kumar, K. [1 ]
Pande, B. P. [2 ]
机构
[1] Guru Nanak Dev Univ, Sikh Natl Coll, Amritsar, Punjab, India
[2] Govt PG Coll, Dept Comp Applicat, LSM, Pithoragarh, Uttarakhand, India
关键词
Air quality index; Machine learning; Indian air quality data; Correlation-based feature selection; Exploratory data analysis; Box plot; Synthetic minority oversampling technique; HYDRODESULFURIZATION;
D O I
10.1007/s13762-022-04241-5
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The survival of mankind cannot be imagined without air. Consistent developments in almost all realms of modern human society affected the health of the air adversely. Daily industrial, transport, and domestic activities are stirring hazardous pollutants in our environment. Monitoring and predicting air quality have become essentially important in this era, especially in developing countries like India. In contrast to the traditional methods, the prediction technologies based on machine learning techniques are proved to be the most efficient tools to study such modern hazards. The present work investigates six years of air pollution data from 23 Indian cities for air quality analysis and prediction. The dataset is well preprocessed and key features are selected through the correlation analysis. An exploratory data analysis is exercised to develop insights into various hidden patterns in the dataset and pollutants directly affecting the air quality index are identified. A significant fall in almost all pollutants is observed in the pandemic year, 2020. The data imbalance problem is solved with a resampling technique and five machine learning models are employed to predict air quality. The results of these models are compared with the standard metrics. The Gaussian Naive Bayes model achieves the highest accuracy while the Support Vector Machine model exhibits the lowest accuracy. The performances of these models are evaluated and compared through established performance parameters. The XGBoost model performed the best among the other models and gets the highest linearity between the predicted and actual data.
引用
收藏
页码:5333 / 5348
页数:16
相关论文
共 31 条
  • [1] Transparent predictive modelling of catalytic hydrodesulfurization using an interval type-2 fuzzy logic
    Al-Jamimi, Hamdi A.
    Saleh, Tawfik A.
    [J]. JOURNAL OF CLEANER PRODUCTION, 2019, 231 : 1079 - 1088
  • [2] An intelligent approach for the modeling and experimental optimization of molecular hydrodesulfurization over AlMoCoBi catalyst
    Al-Jamimi, Hamdi A.
    Bagudu, Aliyu
    Saleh, Tawfik A.
    [J]. JOURNAL OF MOLECULAR LIQUIDS, 2019, 278 : 376 - 384
  • [3] Supervised machine learning techniques in the desulfurization of oil products for environmental protection: A review
    Al-Jamimi, Hamdi A.
    Al-Azani, Sadam
    Saleh, Tawfik A.
    [J]. PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2018, 120 : 57 - 71
  • [4] Alade Ibrahim Olanrewaju, 2019, Nano-Structures & Nano-Objects, V17, P103, DOI 10.1016/j.nanoso.2018.12.001
  • [5] Predicting the specific heat capacity of alumina/ethylene glycol nanofluids using support vector regression model optimized with Bayesian algorithm
    Alade, Ibrahim Olanrewaju
    Abd Rahman, Mohd Amiruddin
    Saleh, Tawfik A.
    [J]. SOLAR ENERGY, 2019, 183 : 74 - 82
  • [6] Short-term prediction of PM2.5 pollution with deep learning methods
    Ayturan, Y. A.
    Ayturan, Z. C.
    Altun, H. O.
    Kongoli, C.
    Tuncez, F. D.
    Dursun, S.
    Ozturk, A.
    [J]. GLOBAL NEST JOURNAL, 2020, 22 (01): : 126 - 131
  • [7] A systematic review of data mining and machine learning for air pollution epidemiology
    Bellinger, Colin
    Jabbar, Mohomed Shazan Mohomed
    Zaiane, Osmar
    Osornio-Vargas, Alvaro
    [J]. BMC PUBLIC HEALTH, 2017, 17
  • [8] Bhalgat P., 2019, Int J Comput Appl Technol Res, V8, P367, DOI DOI 10.7753/IJCATR0809.1006
  • [9] A Machine Learning Approach to Predict Air Quality in California
    Castelli, Mauro
    Clemente, Fabiana Martins
    Popovic, Ales
    Silva, Sara
    Vanneschi, Leonardo
    [J]. COMPLEXITY, 2020, 2020
  • [10] Caswell Hal, 2001, pi