Identifying a suitable model for predicting hourly pollutant concentrations by using low-cost microstation data and machine learning

被引:0
作者
Rongjin Yang
Lizeyan Yin
Xuejie Hao
Lu Liu
Chen Wang
Xiuhong Li
Qiang Liu
机构
[1] Chinese Research Academy of Environmental Sciences,State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science
[2] Beijing Normal University,undefined
[3] Higher Institute of Computer Modeling and Their Applications,undefined
[4] Clermont Auvergne University,undefined
来源
Scientific Reports | / 12卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Accurately predicting the concentration of PM2.5 (fine particles with a diameter of 2.5 μm or less) is essential for health risk assessment and formulation of air pollution control strategies. At present, there is also a large amount of air pollution data. How to efficiently mine its hidden features to obtain the future concentration of pollutants is very important for the prevention and control of air pollution. Therefore we build a pollutant prediction model based on Lightweight Gradient Boosting Model (LightGBM) shallow machine learning and Long Short-Term Memory (LSTM) neural network. Firstly, the PM2.5 pollutant concentration data of 34 air quality stations in Beijing and the data of 18 weather stations were matched in time and space to obtain an input data set. Subsequently, the input data set was cleaned and preprocessed, and the training set was obtained by methods such as input feature extraction, input factor normalization, and data outlier processing. The hourly PM2.5 concentration value prediction was achieved in accordance with experiments conducted with the hourly PM2.5 data of Beijing from January 1, 2018 to October 1, 2020. Ultimately, the optimal hourly series prediction results were obtained after model comparisons. Through the comparison of these two models, it is found that the RMSE predicted by LSTM model for each pollutant is nearly 50% lower than that of LightGBM, and is more consistent with the fitting curve between the actual observations. The exploration of the input step size of LSTM model found that the accuracy of 3-h input data was higher than that of 12-h input data. It can be used for the management and decision-making of environmental protection departments and the formulation of preventive measures for emergency pollution incidents.
引用
收藏
相关论文
共 122 条
  • [1] Du RL(2014)Analysis of the causes of air pollution in china and management measures Sci. Technol. Innov. Her. 11 106-1997
  • [2] She YY(2020)Variation characteristics and potential source analysis of atmospheric pollutants in west of the Qinling-Daba mountains from 2015 to 2018 Acta Sci. Circum. 40 1987-e146
  • [3] Li ZQ(2022)Global urban temporal trends in fine particulate matter (PM Lancet Planet. Health 6 e139-3173
  • [4] Wang FL(2014)) and attributable health burdens: Estimates from global datasets Atmos. Chem. Phys. 14 3151-1141
  • [5] Southerland VA(2002)The 2013 severe haze over southern Hebei, China: Model evaluation, source apportionment, and policy implications JAMA-J. Am. Med. Assoc. 287 1132-1380
  • [6] Brauer M(2006)Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution J. Air Waste Manag. Assoc. 56 1368-371
  • [7] Mohegh A(2021)Health effects of fine particulate air pollution: Lines that connect Jama Netw. Open 4 e2032064-96
  • [8] Wang LT(2015)associations between simulated future changes in climate, air quality, and human health Nature 525 367-195
  • [9] Wei Z(2021)The contribution of outdoor air pollution sources to premature mortality on a global scale Int. J. Biometeorol. 65 929-1486
  • [10] Yang J(1960)Impact of biometeorological conditions and air pollution on influenza-like illnesses incidence in Warsaw Mon. Weather Rev. 88 88-28