Valid sequential inference on probability forecast performance

被引:15
作者
Henzi, Alexander [1 ]
Ziegel, Johanna F. [1 ]
机构
[1] Univ Bern, Inst Math Stat & Actuarial Sci, Alpeneggstr 22, CH-3012 Bern, Switzerland
基金
瑞士国家科学基金会;
关键词
Consistent scoring function; E-value; Forecast dominance; Optional stopping; Probability forecast; Proper scoring rule; Sequential inference; PREDICTION; EXPECTILES; QUANTILES; ECMWF;
D O I
10.1093/biomet/asab047
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts numerical scores such that a correct forecast achieves a minimal expected score. In this paper, we construct e-values for testing the statistical significance of score differences of competing forecasts in sequential settings. E-values have been proposed as an alternative to p-values for hypothesis testing, and they can easily be transformed into conservative p-values by taking the multiplicative inverse. The e-values proposed in this article are valid in finite samples without any assumptions on the data-generating processes. They also allow optional stopping, so a forecast user may decide to interrupt evaluation, taking into account the available data at any time, and still draw statistically valid inference, which is generally not true for classical p-value-based tests. In a case study on post-processing of precipitation forecasts, state-of-the-art forecast dominance tests and e-values lead to the same conclusions.
引用
收藏
页码:647 / 663
页数:17
相关论文
共 32 条
[1]   A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems [J].
Buizza, R ;
Houtekamer, PL ;
Toth, Z ;
Pellerin, G ;
Wei, MZ ;
Zhu, YJ .
MONTHLY WEATHER REVIEW, 2005, 133 (05) :1076-1097
[2]   COMPARING PREDICTIVE ACCURACY [J].
DIEBOLD, FX ;
MARIANO, RS .
JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 1995, 13 (03) :253-263
[3]   Forecast dominance testing via sign randomization [J].
Ehm, Werner ;
Krueger, Fabian .
ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02) :3758-3793
[4]   Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings [J].
Ehm, Werner ;
Gneiting, Tilmann ;
Jordan, Alexander ;
Krueger, Fabian .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2016, 78 (03) :505-562
[5]   Tests of conditional predictive ability [J].
Giacomini, Raffaella ;
White, Halbert .
ECONOMETRICA, 2006, 74 (06) :1545-1578
[6]   Strictly proper scoring rules, prediction, and estimation [J].
Gneiting, Tilmann ;
Raftery, Adrian E. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) :359-378
[7]   Probabilistic forecasts, calibration and sharpness [J].
Gneiting, Tilmann ;
Balabdaoui, Fadoua ;
Raftery, Adrian E. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2007, 69 :243-268
[8]   Combining predictive distributions [J].
Gneiting, Tilmann ;
Ranjan, R. .
ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 :1747-1782
[9]   Making and Evaluating Point Forecasts [J].
Gneiting, Tilmann .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :746-762
[10]   Safe Testing [J].
Gruenwald, Peter ;
de Heide, Rianne ;
Koolen, Wouter M. .
2020 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2020,