Data selection using support vector regression

被引:0
作者
Michael B. Richman
Lance M. Leslie
Theodore B. Trafalis
Hicham Mansouri
机构
[1] University of Oklahoma,School of Meteorology and Cooperative Institute for Mesoscale Meteorological Studies
[2] University of Oklahoma,School of Industrial and Systems Engineering
[3] Power Costs,undefined
[4] Inc.,undefined
来源
Advances in Atmospheric Sciences | 2015年 / 32卷
关键词
data selection; data thinning; machine learning; support vector regression; Voronoi tessellation; pipeline methods;
D O I
暂无
中图分类号
学科分类号
摘要
Geophysical data sets are growing at an ever-increasing rate, requiring computationally efficient data selection (thinning) methods to preserve essential information. Satellites, such as WindSat, provide large data sets for assessing the accuracy and computational efficiency of data selection techniques. A new data thinning technique, based on support vector regression (SVR), is developed and tested. To manage large on-line satellite data streams, observations from WindSat are formed into subsets by Voronoi tessellation and then each is thinned by SVR (TSVR). Three experiments are performed. The first confirms the viability of TSVR for a relatively small sample, comparing it to several commonly used data thinning methods (random selection, averaging and Barnes filtering), producing a 10% thinning rate (90% data reduction), low mean absolute errors (MAE) and large correlations with the original data. A second experiment, using a larger dataset, shows TSVR retrievals with MAE < 1 m s−1 and correlations ⩽ 0.98. TSVR was an order of magnitude faster than the commonly used thinning methods. A third experiment applies a two-stage pipeline to TSVR, to accommodate online data. The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment, is an order of magnitude faster than the nonpipeline TSVR. Therefore, pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set. This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.
引用
收藏
页码:277 / 286
页数:9
相关论文
共 63 条
[1]  
Barnes S L(1964)A technique for maximizing details in numerical weather-map analysis Journal of Applied Meteorology 3 396-409
[2]  
Bottou L(2004)On-line learning for very large datasets Applied Stochastic Models in Business and Industry 21 137-151
[3]  
LeCun Y(1981)Computing Dirichlet tessellations Comput. J. 24 162-166
[4]  
Bowyer A(1999)Centroidal Voronoi tessellations: applications and algorithms SIAM Review 41 637-676
[5]  
Du Q(2004)The WindSat space borne polarimetric microwave radiometer: Sensor description and early orbit performance IEEE Trans. on Geosci. and Remote Sensing 42 2347-2361
[6]  
Faber V(2009)Quadratic programming formulations for classification and regression Optimization Methods and Software 24 175-185
[7]  
Gunzburger M(2013)A polygon-based line-integral method for calculating vorticity, divergence, and deformation from nonuniform observations J. Appl. Meteor. Climatol. 52 1511-1521
[8]  
Gaiser P W(2006)Incremental support vector learning: Analysis, implementation and applications Journal of Machine Learning Research 7 1909-1936
[9]  
German K M St(2010)Evaluation of data reduction algorithms for real-time analysis Wea. Forecasting 25 511-525
[10]  
Twarog E M(1981)A three-dimensional multivariate statistical interpolation scheme Mon. Wea. Rev. 109 1177-1194