A PM2.5 prediction model based on deep learning and random forest

被引:0
作者
Peng H. [1 ]
Zhou Y. [1 ]
Hu X. [1 ]
Zhang L. [2 ]
Peng Y. [1 ]
Cai X. [1 ]
机构
[1] Institute of Geospatial Information, Information Engineering University, Zhengzhou
[2] Beijing Institute of Remote Sensing Information, Beijing
关键词
Deep Learning; LSTM; PM[!sub]2.5[!/sub; PMCOM; Random Forest; remote sensing;
D O I
10.11834/jrs.20210504
中图分类号
学科分类号
摘要
At present, the situation of environmental pollution in China is grim, among which regional compound air pollution dominated by PM2.5 is the most prominent. Aerosol Optical Depth (AOD) is a key physical quantity used to characterize the degree of atmospheric turbidity, which represents the intensity of aerosol light reduction. Many studies have shown that there is a strong correlation between AOD and PM2.5. Using the AOD data obtained by satellite remote sensing combined with other influencing factors to analyze the change mechanism of PM2.5 is of great significance to air pollution prevention and the protection of human health. The diffusion of PM2.5 is an extremely complicated process, and the PM2.5 prediction model based on the statistical regression method can only describe a relatively simple nonlinear relationship. However, the estimation of PM2.5 is considered to be a more complex multivariable nonlinear problem. Compared with statistical regression models, the PM2.5 prediction model based on traditional machine learning method, the models based on deep learning can dig deep features hidden in historical data. However, the AOD remote sensing data are affected by image time resolution and pixel cloud pollution, which will greatly reduce the effective data. Because the construction of a deep learning method depends on a large amount of training data, less training data will seriously affect the model accuracy. Aiming at the problem that the traditional machine learning algorithm cannot deeply mine the hidden association features in data and the deep learning algorithm has a poor effect under the condition of less data, a combined model of PM2.5 prediction based on deep learning and random forest is proposed. The model builds a training dataset with AOD remote sensing data, meteorological reanalysis data and PM2.5 ground observation data. The deep hidden features in the training data are extracted by the powerful feature extraction ability of the deep learning model first. Then, the extracted hidden features are used in the training of the random forest model, and the predicted value of PM2.5 concentration is obtained by the random forest regression algorithm. To verify the effectiveness of this method, a series of experiments were carried out. The results demonstrate that PMCOM has better prediction accuracy in both overall prediction and seasonal prediction scenarios. The combination of random forest and long- and short-term memory neural networks is the best for this experiment. Even when only 35% of the data are used for training, R2 in the overall prediction experiment can reach 0.89, and R2 in each season prediction experiment is also above 0.75. The combination of deep learning and random forest can reduce the dependence of deep learning models on the amount of data by random forest and make full use of the high-level hidden features of existing historical data. In this way, it makes up for the deficiency of mining the internal associated features of data by a random forest model and improves the prediction accuracy of PM2.5 concentration. © 2023 Science Press. All rights reserved.
引用
收藏
页码:430 / 440
页数:10
相关论文
共 34 条
[1]  
Breiman L., Bagging predictors, Machine Learning, 24, 2, pp. 123-140, (1996)
[2]  
Daryanoosh S M, Goudarzi G, Mohammadi M J, Armin H, Khaniabadi Y O, Sadeghi S., Exposure to particulate matter and its health impacts an AirQ approach, Archives of Hygiene Science, 6, 1, pp. 88-95, (2017)
[3]  
Du X, Feng J Y, Lv S Q, Shi W., PM<sub>2.5</sub> concentration prediction model based on random forest regression analysis, Telecommunications Science, 33, 7, pp. 66-75, (2017)
[4]  
Duan J X, Zhai W X, Cheng C Q, Chen B., Socio-economic factors influencing the spatial distribution of PM<sub>2.5</sub> concentrations in China: an exploratory analysis, Environmental Science, 39, 5, pp. 2498-2504, (2018)
[5]  
Engel-Cox J A, Holloman C H, Coutant B W, Hoff R M., Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality, Atmospheric Environment, 38, 16, pp. 2495-2509, (2004)
[6]  
Geng G N, Meng X, He K B, Liu Y., Random forest models for PM<sub>2.5</sub> speciation concentrations using MISR fractional AODs, Environmental Research Letters, 15, 3, (2020)
[7]  
Gers F A, Schraudolph N N, Schmidhuber J., Learning precise timing with LSTM recurrent networks, The Journal of Machine Learning Research, 3, 1, pp. 115-143, (2003)
[8]  
Guo J P, Xia F, Zhang Y, Liu H, Li J, Lou M Y, He J, Yan Y, Wang F, Min M, Zhai P M., Impact of diurnal variability and meteorological factors on the PM<sub>2.5</sub>-AOD relationship: Implications for PM<sub>2.5</sub> remote sensing, Environmental Pollution, 221, pp. 94-104, (2017)
[9]  
Ho T K., The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 8, pp. 832-844, (1998)
[10]  
Huang B, Wu B, Barry M., Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices, International Journal of Geographical Information Science, 24, 3, pp. 383-401, (2010)