Data Selection Using Support Vector Regression

被引:0
作者
Michael BRICHMAN [1 ]
Lance MLESLIE [1 ]
Theodore BTRAFALIS [2 ]
Hicham MANSOURI [3 ]
机构
[1] School of Meteorology and Cooperative Institute for Mesoscale Meteorological Studies,University of Oklahoma,Norman, Oklahoma, , USA
[2] School of Industrial and Systems Engineering, University of Oklahoma,Norman, Oklahoma, , USA
[3] Power Costs, Inc, David LBoren Blvd,Suite , Norman, Oklahoma , USA
关键词
data selection; data thinning; machine learning; support vector regression; Voronoi tessellation; pipeline methods;
D O I
暂无
中图分类号
P413 [数据处理];
学科分类号
0706 ; 070601 ;
摘要
Geophysical data sets are growing at an ever-increasing rate, requiring computationally efficient data selection(thinning)methods to preserve essential information. Satellites, such as Wind Sat, provide large data sets for assessing the accuracy and computational efficiency of data selection techniques. A new data thinning technique, based on support vector regression(SVR), is developed and tested. To manage large on-line satellite data streams, observations from Wind Sat are formed into subsets by Voronoi tessellation and then each is thinned by SVR(TSVR). Three experiments are performed. The first confirms the viability of TSVR for a relatively small sample, comparing it to several commonly used data thinning methods(random selection, averaging and Barnes filtering), producing a 10% thinning rate(90% data reduction), low mean absolute errors(MAE) and large correlations with the original data. A second experiment, using a larger dataset, shows TSVR retrievals with MAE < 1 m s-1and correlations 0.98. TSVR was an order of magnitude faster than the commonly used thinning methods. A third experiment applies a two-stage pipeline to TSVR, to accommodate online data. The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment, is an order of magnitude faster than the nonpipeline TSVR. Therefore, pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set. This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.
引用
收藏
页码:277 / 286
页数:10
相关论文
共 7 条
[1]  
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics[J] . Anjani Ragothaman,Sairam Chowdary Boddu,Nayong Kim,Wei Feinstein,Michal Brylinski,Shantenu Jha,Joohyun Kim,Daniele D’Agostino.BioMed Research International . 2014
[2]  
A Polygon-Based Line-Integral Method for Calculating Vorticity, Divergence, and Deformation from Nonuniform Observations[J] . Helms,Charles N,Hart,Robert E.Journal of Applied Meteorology and Climatology . 2013 (6)
[3]   Evaluation of Data Reduction Algorithms for Real-Time Analysis [J].
Lazarus, Steven M. ;
Splitt, Michael E. ;
Lueken, Michael D. ;
Ramachandran, Rahul ;
Li, Xiang ;
Movva, Sunil ;
Graves, Sara J. ;
Zavodsky, Bradley T. .
WEATHER AND FORECASTING, 2010, 25 (03) :837-851
[4]   Quadratic programming formulations for classificationand regression [J].
Gilbert, Robin C. ;
Trafalis, Theodore B. .
OPTIMIZATION METHODS & SOFTWARE, 2009, 24 (02) :175-185
[5]  
Adaptive thinning of atmospheric observations in data assimilation with vector quantization and filtering methods[J] . T.Ochotta,C.Gebhardt,D.Saupe,W.Wergen.Q.J.R. Meteorol. Soc. . 2006 (613)
[6]  
Large Scale Kernel Regression via Linear Programming[J] . O.L. Mangasarian,David R. Musicant.Machine Learning . 2002 (1)
[7]  
A Technique for Maximizing Details in Numerical Weather Map Analysis[J] . Journal of Applied Meteorology (1962-1982) . 1964 (4)