Handling high-dimensional data in air pollution forecasting tasks

被引:8
作者
Domanska, Diana [1 ,4 ]
Lukasik, Szymon [2 ,3 ]
机构
[1] Univ Silesia, Inst Comp Sci, Ul Bedzinska 39, PL-41200 Sosnowiec, Poland
[2] Polish Acad Sci, Syst Res Inst, Ul Newelska 6, PL-01447 Warsaw, Poland
[3] AGH Univ Sci & Technol, Fac Phys & Appl Comp Sci, Al Mickiewicza 30, PL-30059 Krakow, Poland
[4] Univ Oslo, Dept Informat, POB 1072, N-0316 Oslo, Norway
关键词
Big data; Multidimensional data; Dimensionality reduction; Fractional distances; Forecasting; Pollution; PRINCIPAL COMPONENT; FEATURE-SELECTION; REDUCTION; MODEL; PREDICTION; ALGORITHM; INDEX; PM10;
D O I
10.1016/j.ecoinf.2016.04.007
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In the paper methods aimed at handling high-dimensional weather forecasts data used to predict the concentrations of PM10, PM2.5, SO2, NO, CO and O-3 are being proposed. The procedure employed to predict pollution normally requires historical data samples for a large number of points in time particularly weather forecast data, actual weather data and pollution data. Likewise, it typically involves using numerous features related to atmospheric conditions. Consequently the analysis of such datasets to generate accurate forecasts becomes very cumbersome task. The paper examines a variety of unsupervised dimensionality reduction methods aimed at obtaining compact yet informative set of features. As an alternative, approach using fractional distances for data analysis tasks is being considered as well. Both strategies were evaluated on real-world data obtained from the Institute of Meteorology and Water Management in Katowice (Poland), with extended Air Pollution Forecast Model (e-APFM) being used as underlying prediction tool. It was found that employing fractional distance as a dissimilarity measure ensures the best accuracy of forecasting. Satisfactory results can be also obtained with Isomap, Landmark Isomap and Factor Analysis as dimensionality reduction techniques. These methods can be also used to formulate universal mapping, ready-to-use for data gathered at different geographical areas. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:70 / 91
页数:22
相关论文
共 81 条
[1]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2]   Stochastic proximity embedding [J].
Agrafiotis, DK .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2003, 24 (10) :1215-1221
[3]  
[Anonymous], 2012, PROCEDIA SOCIAL BEHA, DOI DOI 10.1016/J.SBSPRO.2012.04.045
[4]  
[Anonymous], 2004, Advances in neural information processing systems, DOI DOI 10.5555/2976040.2976138
[5]  
[Anonymous], 2009, Clustering
[6]  
[Anonymous], 2003, Advances in Neural Informaiton Processing Systems
[7]  
Baer F, 2000, ADV COMPUT, V52, P91
[8]   Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data [J].
Bartenhagen, Christoph ;
Klein, Hans-Ulrich ;
Ruckert, Christian ;
Jiang, Xiaoyi ;
Dugas, Martin .
BMC BIOINFORMATICS, 2010, 11
[9]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[10]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217