Data selection using support vector regression

被引:6
作者
Richman, Michael B. [1 ,2 ]
Leslie, Lance M. [1 ,2 ]
Trafalis, Theodore B. [3 ]
Mansouri, Hicham [4 ]
机构
[1] Univ Oklahoma, Sch Meteorol, Norman, OK 73072 USA
[2] Univ Oklahoma, Cooperat Inst Mesoscale Meteorol Studies, Norman, OK 73072 USA
[3] Univ Oklahoma, Sch Ind & Syst Engn, Norman, OK 73019 USA
[4] Power Costs Inc, Norman, OK 73072 USA
关键词
data selection; data thinning; machine learning; support vector regression; Voronoi tessellation; pipeline methods;
D O I
10.1007/s00376-014-4072-9
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Geophysical data sets are growing at an ever-increasing rate, requiring computationally efficient data selection (thinning) methods to preserve essential information. Satellites, such as WindSat, provide large data sets for assessing the accuracy and computational efficiency of data selection techniques. A new data thinning technique, based on support vector regression (SVR), is developed and tested. To manage large on-line satellite data streams, observations from WindSat are formed into subsets by Voronoi tessellation and then each is thinned by SVR (TSVR). Three experiments are performed. The first confirms the viability of TSVR for a relatively small sample, comparing it to several commonly used data thinning methods (random selection, averaging and Barnes filtering), producing a 10% thinning rate (90% data reduction), low mean absolute errors (MAE) and large correlations with the original data. A second experiment, using a larger dataset, shows TSVR retrievals with MAE < 1 m s(-1) and correlations a (c) 1/2 0.98. TSVR was an order of magnitude faster than the commonly used thinning methods. A third experiment applies a two-stage pipeline to TSVR, to accommodate online data. The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment, is an order of magnitude faster than the nonpipeline TSVR. Therefore, pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set. This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.
引用
收藏
页码:277 / 286
页数:10
相关论文
共 36 条
  • [11] CHANG PS, 1997, P INT GEOSC REM SENS
  • [12] Centroidal Voronoi tessellations: Applications and algorithms
    Du, Q
    Faber, V
    Gunzburger, M
    [J]. SIAM REVIEW, 1999, 41 (04) : 637 - 676
  • [13] The WindSat spaceborne polarimetric microwave radiometer: Sensor description and early orbit performance
    Gaiser, PW
    St Germain, KM
    Twarog, EM
    Poe, GA
    Purdy, W
    Richardson, D
    Grossman, W
    Jones, WL
    Spencer, D
    Golba, G
    Cleveland, J
    Choy, L
    Bevilacqua, RM
    Chang, PS
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2004, 42 (11): : 2347 - 2361
  • [14] Quadratic programming formulations for classificationand regression
    Gilbert, Robin C.
    Trafalis, Theodore B.
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2009, 24 (02) : 175 - 185
  • [15] A Polygon-Based Line-Integral Method for Calculating Vorticity, Divergence, and Deformation from Nonuniform Observations
    Helms, Charles N.
    Hart, Robert E.
    [J]. JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY, 2013, 52 (06) : 1511 - 1521
  • [16] Laskov P, 2006, J MACH LEARN RES, V7, P1909
  • [17] Lazarus S. M., 2010, WEA FORECASTING, V25, P511
  • [18] Lorenc A. C., 1981, MON WEA REV, V109, P1177
  • [19] Mansouri H., 2007, INTELLIGENT ENG SYST, V17, P333
  • [20] Musicant D. R., 2000, MACH LEARN, V46, P255