共 40 条
Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation
被引:314
作者:
Meyer, Hanna
[1
]
Reudenbach, Christoph
[1
]
Hengl, Tomislav
[2
]
Katurji, Marwan
[3
]
Nauss, Thomas
[1
]
机构:
[1] Philipps Univ Marburg, Fac Geog, Deutschhausstr 10, D-35037 Marburg, Germany
[2] ISRIC World Soil Informat, POB 363, NL-6700 AJ Wageningen, Netherlands
[3] Univ Canterbury, Ctr Atmospher Res, Private Bag 4800, Christchurch 8020, New Zealand
关键词:
Cross-validation;
Feature selection;
Over-fitting;
Random forest;
Spatio-temporal;
Target-oriented validation;
AIR-TEMPERATURE;
CLASSIFICATION;
INTERPOLATION;
PRECIPITATION;
ALGORITHMS;
RETRIEVAL;
PLATEAU;
COVER;
D O I:
10.1016/j.envsoft.2017.12.001
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
Importance of target-oriented validation strategies for spatio-temporal prediction models is illustrated using two case studies: (1) modelling of air temperature (T-air) in Antarctica, and (2) modelling of volumetric water content (VW) for the R.J. Cook Agronomy Farm, USA. Performance of a random k-fold cross-validation (CV) was compared to three target-oriented strategies: Leave-Location-Out (LLO), Leave-Time-Out (LTO), and Leave-Location-and-Time-Out (LLTO) CV. Results indicate that considerable differences between random k-fold (R-2 = 0.9 for T-air and 0.92 for VW) and target-oriented CV (LLO R-2 = 0.24 for T-air and 0.49 for VW) exist, highlighting the need for target-oriented validation to avoid an overoptimistic view on models. Differences between random k-fold and target-oriented CV indicate spatial over-fitting caused by misleading variables. To decrease over-fitting, a forward feature selection in conjunction with target-oriented CV is proposed. It decreased over-fitting and simultaneously improved target-oriented performances (LLO CV R-2 = 0.47 for T-air and 0.55 for VW). (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文