PM2.5 Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM

被引:22
作者
Dai, Hongbin [1 ]
Huang, Guangqiu [1 ]
Zeng, Huibin [1 ]
Yang, Fan [1 ]
机构
[1] Xian Univ Architecture & Technol, Sch Management, Xian 710055, Peoples R China
基金
中国国家自然科学基金;
关键词
XGBoost; MSCNN; genetic algorithm; LSTM; feature selection; spatiotemporal feature extraction; NEURAL-NETWORKS; MODEL;
D O I
10.3390/su132112071
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
With the rapid development of China's industrialization, air pollution is becoming more and more serious. Predicting air quality is essential for identifying further preventive measures to avoid negative impacts. The existing prediction of atmospheric pollutant concentration ignores the problem of feature redundancy and spatio-temporal characteristics; the accuracy of the model is not high, the mobility of it is not strong. Therefore, firstly, extreme gradient lifting (XGBoost) is applied to extract features from PM2.5, then one-dimensional multi-scale convolution kernel (MSCNN) is used to extract local temporal and spatial feature relations from air quality data, and linear splicing and fusion is carried out to obtain the spatio-temporal feature relationship of multi-features. Finally, XGBoost and MSCNN combine the advantages of LSTM in dealing with time series. Genetic algorithm (GA) is applied to optimize the parameter set of long-term and short-term memory network (LSTM) network. The spatio-temporal relationship of multi-features is input into LSTM network, and then the long-term feature dependence of multi-feature selection is output to predict PM2.5 concentration. A XGBoost-MSCGL of PM2.5 concentration prediction model based on spatio-temporal feature selection is established. The data set comes from the hourly concentration data of six kinds of atmospheric pollutants and meteorological data in Fen-Wei Plain in 2020. To verify the effectiveness of the model, the XGBoost-MSCGL model is compared with the benchmark models such as multilayer perceptron (MLP), CNN, LSTM, XGBoost, CNN-LSTM with before and after using XGBoost feature selection. According to the forecast results of 12 cities, compared with the single model, the root mean square error (RMSE) decreased by about 39.07%, the average MAE decreased by about 42.18%, the average MAE decreased by about 49.33%, but R-2 increased by 23.7%. Compared with the model after feature selection, the root mean square error (RMSE) decreased by an average of about 15%. On average, the MAPE decreased by 16%, the MAE decreased by 21%, and R-2 increased by 2.6%. The experimental results show that the XGBoost-MSCGL prediction model offer a more comprehensive understanding, runs deeper levels, guarantees a higher prediction accuracy, and ensures a better generalization ability in the prediction of PM2.5 concentration.
引用
收藏
页数:24
相关论文
共 46 条
[1]   A nonlinear regression model estimating single source concentrations of primary and secondarily formed PM2.5 [J].
Baker, Kirk R. ;
Foley, Kristen M. .
ATMOSPHERIC ENVIRONMENT, 2011, 45 (22) :3758-3767
[2]   Contribution of low-cost sensor measurements to the prediction of PM2.5 levels: A case study in Imperial County, California, USA [J].
Bi, Jianzhao ;
Stowell, Jennifer ;
Seto, Edmund Y. W. ;
English, Paul B. ;
Al-Hamdan, Mohammad Z. ;
Kinney, Patrick L. ;
Freedman, Frank R. ;
Liu, Yang .
ENVIRONMENTAL RESEARCH, 2020, 180
[3]   Temporal and spatial analysis of ozone concentrations in Europe based on timescale decomposition and a multi-clustering approach [J].
Boleti, Eirini ;
Hueglin, Christoph ;
Grange, Stuart K. ;
Prevot, Andre S. H. ;
Takahama, Satoshi .
ATMOSPHERIC CHEMISTRY AND PHYSICS, 2020, 20 (14) :9051-9066
[4]   Air Pollution (Particulate Matter) Exposure and Associations with Depression, Anxiety, Bipolar, Psychosis and Suicide Risk: A Systematic Review and Meta-Analysis [J].
Braithwaite, Isobel ;
Zhang, Shuo ;
Kirkbride, James B. ;
Osborn, David P. J. ;
Hayes, Joseph F. .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2019, 127 (12)
[5]   Wintertime vertical variations in particulate matter (PM) and precursor concentrations in the San Joaquin Valley during the California Regional coarse PM/fine PM Air Quality Study [J].
Brown, Steven G. ;
Roberts, Paul T. ;
McCarthy, Michael C. ;
Lurmann, Frederick W. ;
Hyslop, Nicole P. .
JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION, 2006, 56 (09) :1267-1277
[6]   Estimation of healthcare expenditure per capita of Turkey using artificial intelligence techniques with genetic algorithm-based feature selection [J].
Ceylan, Zeynep ;
Atalan, Abdulkadir .
JOURNAL OF FORECASTING, 2021, 40 (02) :279-290
[7]  
[陈兵红 Chen Binghong], 2021, [环境科学学报, Acta Scientiae Circumstantiae], V41, P817
[8]   Humidity plays an important role in the PM2.5 pollution in Beijing [J].
Cheng, Yuan ;
He, Ke-bin ;
Du, Zhen-yu ;
Zheng, Mei ;
Duan, Feng-kui ;
Ma, Yong-liang .
ENVIRONMENTAL POLLUTION, 2015, 197 :68-75
[9]   A hybrid load forecasting model based on support vector machine with intelligent methods for feature selection and parameter optimization [J].
Dai, Yeming ;
Zhao, Pei .
APPLIED ENERGY, 2020, 279
[10]   Exploring a deep LSTM neural network to forecast daily PM2.5concentration using meteorological parameters in Kathmandu Valley, Nepal [J].
Dhakal, Sandeep ;
Gautam, Yogesh ;
Bhattarai, Aayush .
AIR QUALITY ATMOSPHERE AND HEALTH, 2021, 14 (01) :83-96