Cross-validation for change-point regression: Pitfalls and solutions

被引:1
作者
Pein, Florian [1 ]
Shah, Rajen d. [2 ]
机构
[1] Univ Lancaster, Lancaster, England
[2] Univ Cambridge, Cambridge, England
基金
英国工程与自然科学研究理事会;
关键词
Change-point regression; cross-validation; segment neighbourhood; sample-splitting; selection consistency; tuning parameter selection; BINARY SEGMENTATION; NUMBER; CONSISTENCY; ALGORITHM;
D O I
10.3150/24-BEJ1732
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation is the standard approach for tuning parameter selection in many non-parametric regression problems. However its use is less common in change-point regression, perhaps as its prediction error-based criterion may appear to permit small spurious changes and hence be less well-suited to estimation of the number and location of change-points. We show that in fact the problems of cross-validation with squared error loss are more severe and can lead to systematic under- or over-estimation of the number of change-points, and highly suboptimal estimation of the mean function in simple settings where changes are easily detectable. We propose two simple approaches to remedy these issues, the first involving the use of absolute error rather than squared error loss, and the second involving modifying the holdout sets used. For the latter, we provide conditions that permit consistent estimation of the number of change-points for a general change-point estimation procedure. We show these conditions are satisfied for least squares estimation using new results on its performance when supplied with the incorrect number of change-points. Numerical experiments show that our new approaches are competitive with common change-point methods using classical tuning parameter choices when error distributions are well-specified, but can substantially outperform these in misspecified models. An implementation of our methodology is available in the R package crossvalidationCP on CRAN.
引用
收藏
页码:388 / 411
页数:24
相关论文
共 46 条
[1]  
[Anonymous], 1995, J. Nonparametr. Stat., DOI DOI 10.1080/10485259508832639
[2]   Segmentation of the mean of heteroscedastic data via cross-validation [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS AND COMPUTING, 2011, 21 (04) :613-632
[3]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[4]   ALGORITHMS FOR THE OPTIMAL IDENTIFICATION OF SEGMENT NEIGHBORHOODS [J].
AUGER, IE ;
LAWRENCE, CE .
BULLETIN OF MATHEMATICAL BIOLOGY, 1989, 51 (01) :39-54
[5]   Computation and analysis of multiple structural change models [J].
Bai, J ;
Perron, P .
JOURNAL OF APPLIED ECONOMETRICS, 2003, 18 (01) :1-22
[6]   DETECTION WITH THE SCAN AND THE AVERAGE LIKELIHOOD RATIO [J].
Chan, Hock Peng ;
Walther, Guenther .
STATISTICA SINICA, 2013, 23 (01) :409-428
[7]   ON CROSS-VALIDATED LASSO IN HIGH DIMENSIONS [J].
Chetverikov, Denis ;
Liao, Zhipeng ;
Chernozhukov, Victor .
ANNALS OF STATISTICS, 2021, 49 (03) :1300-1317
[8]   Fuzzy/Bayesian change point detection approach to incipient fault detection [J].
D'Angelo, M. F. S. V. ;
Palhares, R. M. ;
Takahashi, R. H. C. ;
Loschi, R. H. .
IET CONTROL THEORY AND APPLICATIONS, 2011, 5 (04) :539-551
[9]   IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE [J].
DONOHO, DL ;
JOHNSTONE, IM .
BIOMETRIKA, 1994, 81 (03) :425-455
[10]   Stepwise Signal Extraction via Marginal Likelihood [J].
Du, Chao ;
Kao, Chu-Lan Michael ;
Kou, S. C. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (513) :314-330