Regression trees modeling of time series for air pollution analysis and forecasting

被引:32
作者
Gocheva-Ilieva, Snezhana Georgieva [1 ]
Voynikova, Desislava Stoyanova [1 ]
Stoimenova, Maya Plamenova [1 ]
Ivanov, Atanas Valev [1 ]
Iliev, Iliycho Petkov [2 ]
机构
[1] Univ Plovdiv Paisii Hilendarski, Fac Math & Informat, Dept Appl Math & Modeling, 24 Tzar Asen St, Plovdiv 4000, Bulgaria
[2] Tech Univ Sofia, Branch Plovdiv, Dept Phys, 25 Tzanko Djusstabanov St, Plovdiv 4000, Bulgaria
关键词
Air pollution modeling; Time series; Classification and regression trees (CART); Pollution forecast; 62M10; 62M20; 62P12; PARTICULATE MATTER; SPLINES; PM10; CLASSIFICATION; ROBUSTIFICATION; POLLUTANTS; MORTALITY; QUALITY; RCMARS; OZONE;
D O I
10.1007/s00521-019-04432-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Solving the problems related to air pollution is crucial for human health and the ecosystems in many urban areas throughout the world. The accumulation of large arrays of data with measurements of various air pollutants makes it possible to analyze these in order to predict and control pollution. This study presents a common approach for building quality nonlinear models of environmental time series by using the powerful data mining technique of classification and regression trees (CART). Predictors for modeling are time series with meteorological, atmospheric or other data, date-time variables and lagged variables of the dependent variable and predictors, involved as groups. The proposed approach is tested in empirical studies of the daily average concentrations of atmospheric PM10 (particulate matter 10 mu m in diameter) in the cities of Ruse and Pernik, Bulgaria. A 1-day-ahead forecasts are obtained. All models are cross-validated against overfitting. The best models are selected using goodness-of-fit measures, such as root-mean-square error and coefficient of determination. Relative importance of the predictors and predictor groups is obtained and interpreted. The CART models are compared with the corresponding models built by using ARIMA transfer function methodology, and the superiority of CART over ARIMA is demonstrated. The practical applicability of the models is assessed using 2x2 contingency tables. The results show that CART models fit well the data and correctly predict about 90% of measured values of PM10 with respect to the average daily European threshold value of 50 mu g/m(3).
引用
收藏
页码:9023 / 9039
页数:17
相关论文
共 59 条
[1]   Forecasting PM10 in Algiers: efficacy of multilayer perceptron networks [J].
Abderrahim, Hamza ;
Chellali, Mohammed Reda ;
Hamou, Ahmed .
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2016, 23 (02) :1634-1641
[2]  
Anderson D.R., 2002, MODEL SELECTION INFE, DOI DOI 10.1007/B97636
[3]  
[Anonymous], 2011, INT GEOPHYS, DOI DOI 10.1016/B978-0-12-385022-5.00008-7
[4]  
[Anonymous], ENVIRONMENT
[5]  
[Anonymous], CART 6 0 USERS GUIDE
[6]  
[Anonymous], 2009, J ELECTROCHEM SOC
[7]  
[Anonymous], TIME SERIES ANAL FOR
[8]  
[Anonymous], 2013, NATL SYSTEM ENV MONI
[9]  
[Anonymous], 2019, ALADIN PROJECT WEATH
[10]  
[Anonymous], 2016, SALF SYST DAT MIN PR