Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China

被引:185
作者
Zhai, Binxu [1 ,2 ]
Chen, Jianguo [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Engn Phys, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Beijing Key Lab City Integrated Emergency Respons, Beijing 100084, Peoples R China
基金
美国国家科学基金会;
关键词
Air quality forecast; Feature extraction; Feature selection; Stacked generalization strategy; Feature importance analysis; FINE PARTICULATE MATTER; HIDDEN MARKOV MODEL; REGIONAL TRANSPORT; AIR-POLLUTION; SEVERE HAZE; NEURAL-NETWORK; GLOBAL BURDEN; OZONE LEVELS; JANUARY; 2013; WINTER HAZE;
D O I
10.1016/j.scitotenv.2018.04.040
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM2.5) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm(GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO2) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R-2) of 0.90 and a root mean squared error (RMSE) of 23.69 mu g/m(3). For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:644 / 658
页数:15
相关论文
共 51 条
[1]   An Estimate of the Global Burden of Anthropogenic Ozone and Fine Particulate Matter on Premature Human Mortality Using Atmospheric Modeling [J].
Anenberg, Susan C. ;
Horowitz, Larry W. ;
Tong, Daniel Q. ;
West, J. Jason .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2010, 118 (09) :1189-1195
[2]  
[Anonymous], LANCET
[3]  
[Anonymous], CHIN ENV STAT B 2016
[4]  
[Anonymous], 2016, ATMOSPHERIC CHEM PHY
[5]   Addressing Global Mortality from Ambient PM2.5 [J].
Apte, Joshua S. ;
Marshall, Julian D. ;
Cohen, Aaron J. ;
Brauer, Michael .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2015, 49 (13) :8057-8066
[6]   Review of the governing equations, computational algorithms, and other components of the models-3 Community Multiscale Air Quality (CMAQ) modeling system [J].
Byun, Daewon ;
Schere, Kenneth L. .
APPLIED MECHANICS REVIEWS, 2006, 59 (1-6) :51-77
[7]   Estimating the contribution of regional transport to PM2.5 air pollution in a rural area on the North China Plain [J].
Chen, Dongsheng ;
Liu, Xiangxue ;
Lang, Jianlei ;
Zhou, Ying ;
Wei, Lin ;
Wang, Xiaotong ;
Guo, Xiurui .
SCIENCE OF THE TOTAL ENVIRONMENT, 2017, 583 :280-291
[8]  
Chen Tianqi, 2016, ARXIV, DOI DOI 10.1145/2939672.2939785
[9]   Model selection by LASSO methods in a change-point model [J].
Ciuperca, Gabriela .
STATISTICAL PAPERS, 2014, 55 (02) :349-374
[10]   An enhanced PM2.5 air quality forecast model based on nonlinear regression and back-trajectory concentrations [J].
Cobourn, W. Geoffrey .
ATMOSPHERIC ENVIRONMENT, 2010, 44 (25) :3015-3023