Air pollution prediction with machine learning: a case study of Indian cities

被引:118
作者
Kumar, K. [1 ]
Pande, B. P. [2 ]
机构
[1] Guru Nanak Dev Univ, Sikh Natl Coll, Amritsar, Punjab, India
[2] Govt PG Coll, Dept Comp Applicat, LSM, Pithoragarh, Uttarakhand, India
关键词
Air quality index; Machine learning; Indian air quality data; Correlation-based feature selection; Exploratory data analysis; Box plot; Synthetic minority oversampling technique; HYDRODESULFURIZATION;
D O I
10.1007/s13762-022-04241-5
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The survival of mankind cannot be imagined without air. Consistent developments in almost all realms of modern human society affected the health of the air adversely. Daily industrial, transport, and domestic activities are stirring hazardous pollutants in our environment. Monitoring and predicting air quality have become essentially important in this era, especially in developing countries like India. In contrast to the traditional methods, the prediction technologies based on machine learning techniques are proved to be the most efficient tools to study such modern hazards. The present work investigates six years of air pollution data from 23 Indian cities for air quality analysis and prediction. The dataset is well preprocessed and key features are selected through the correlation analysis. An exploratory data analysis is exercised to develop insights into various hidden patterns in the dataset and pollutants directly affecting the air quality index are identified. A significant fall in almost all pollutants is observed in the pandemic year, 2020. The data imbalance problem is solved with a resampling technique and five machine learning models are employed to predict air quality. The results of these models are compared with the standard metrics. The Gaussian Naive Bayes model achieves the highest accuracy while the Support Vector Machine model exhibits the lowest accuracy. The performances of these models are evaluated and compared through established performance parameters. The XGBoost model performed the best among the other models and gets the highest linearity between the predicted and actual data.
引用
收藏
页码:5333 / 5348
页数:16
相关论文
共 31 条
[1]   Transparent predictive modelling of catalytic hydrodesulfurization using an interval type-2 fuzzy logic [J].
Al-Jamimi, Hamdi A. ;
Saleh, Tawfik A. .
JOURNAL OF CLEANER PRODUCTION, 2019, 231 :1079-1088
[2]   An intelligent approach for the modeling and experimental optimization of molecular hydrodesulfurization over AlMoCoBi catalyst [J].
Al-Jamimi, Hamdi A. ;
Bagudu, Aliyu ;
Saleh, Tawfik A. .
JOURNAL OF MOLECULAR LIQUIDS, 2019, 278 :376-384
[3]   Supervised machine learning techniques in the desulfurization of oil products for environmental protection: A review [J].
Al-Jamimi, Hamdi A. ;
Al-Azani, Sadam ;
Saleh, Tawfik A. .
PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2018, 120 :57-71
[4]  
Alade Ibrahim Olanrewaju, 2019, Nano-Structures & Nano-Objects, V17, P103, DOI 10.1016/j.nanoso.2018.12.001
[5]   Predicting the specific heat capacity of alumina/ethylene glycol nanofluids using support vector regression model optimized with Bayesian algorithm [J].
Alade, Ibrahim Olanrewaju ;
Abd Rahman, Mohd Amiruddin ;
Saleh, Tawfik A. .
SOLAR ENERGY, 2019, 183 :74-82
[6]   Short-term prediction of PM2.5 pollution with deep learning methods [J].
Ayturan, Y. A. ;
Ayturan, Z. C. ;
Altun, H. O. ;
Kongoli, C. ;
Tuncez, F. D. ;
Dursun, S. ;
Ozturk, A. .
GLOBAL NEST JOURNAL, 2020, 22 (01) :126-131
[7]   A systematic review of data mining and machine learning for air pollution epidemiology [J].
Bellinger, Colin ;
Jabbar, Mohomed Shazan Mohomed ;
Zaiane, Osmar ;
Osornio-Vargas, Alvaro .
BMC PUBLIC HEALTH, 2017, 17
[8]  
Bhalgat P., 2019, Int J Comput Appl Technol Res, V8, P367, DOI [10.7753/ijcatr0809, DOI 10.7753/IJCATR0809.1006]
[9]   Predicting Days on Market to Optimize Real Estate Sales Strategy [J].
Castelli, Mauro ;
Dobreva, Maria ;
Henriques, Roberto ;
Vanneschi, Leonardo .
COMPLEXITY, 2020, 2020
[10]  
Dalberg, 2019, Air pollution and its impact on business: the silent pandemic