Markov cross-validation for time series model evaluations

被引:16
作者
Jiang, Gaoxia [1 ]
Wang, Wenjian [1 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Peoples R China
基金
中国国家自然科学基金;
关键词
Model evaluation; Markov cross-validation; Time series; CLASSIFICATION LEARNING ALGORITHMS; BOOTSTRAP; VARIANCE;
D O I
10.1016/j.ins.2016.09.061
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-validation (CV) is a simple and universal tool to estimate generalization ability, however, existing CVs do not work well for periodicity, overlapping or correlation of series. The corresponding three criteria aimed at describing these properties are presented. Based on them, we put forward a novel Markov cross-validation (M-CV), whose data partition can be seen as a Markov process. The partition ensures that samples in each subset are neither too close nor too far. In doing so, overfitting model or information loss of series, which may result in underestimation or overestimation of the error, can be avoided. Furthermore, subsets from M-CV partition could well represent the original series, and it may be extended to time series or stream data sampling. Theoretical analysis shows that M-CV is the unique one which meets all of above criteria among current CVs. In addition, the error estimation on subsets is proved to have less variance than that on original series, therefore it ensures the stability of M-CV. Experimental results demonstrate that the proposed M-CV has lower bias, variance and time consumption than other CVs. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:219 / 233
页数:15
相关论文
共 23 条