Testing prediction algorithms as null hypotheses: Application to assessing the performance of deep neural networks

被引:3
作者
Bickel, David R. [1 ]
机构
[1] Univ Ottawa, Dept Math & Stat, Dept Biochem Microbiol & Immunol, Ottawa Inst Syst Biol, 451 Smyth Rd, Ottawa, ON K1H 8M5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
big data; data science; deep learning; deep neural network; model predictive distribution; model predictive p value; regression; FALSE DISCOVERY RATE; PROPORTION;
D O I
10.1002/sta4.270
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bayesian models use posterior predictive distributions to quantify the uncertainty of their predictions. Similarly, the point predictions of neural networks and other machine learning algorithms may be converted to predictive distributions by various bootstrap methods. The predictive performance of each algorithm can then be assessed by quantifying the performance of its predictive distribution. Previous methods for assessing such performance are relative, indicating whether certain algorithms perform better than others. This paper proposes performance measures that are absolute in the sense that they indicate whether or not an algorithm performs adequately without requiring comparisons with other algorithms. The first proposed performance measure is a predictive p value that generalizes a prior predictive p value with the prior distribution equal to the posterior distribution of previous data. The other proposed performance measures use the generalized predictive p value for each prediction to estimate the proportion of target values that are compatible with the predictive distribution. The new performance measures are illustrated by using them to evaluate the predictive performance of deep neural networks when applied to the analysis of a large housing price data set that is used as a standard in machine learning.
引用
收藏
页数:8
相关论文
共 32 条
[1]  
[Anonymous], 2017, NAT HUM BEHAV
[2]  
[Anonymous], 2006, An Introduction to Bayesian Analysis, DOI DOI 10.3758/S13423-020-01798-5
[3]  
[Anonymous], 1980, REGRESSION DIAGNOSTI
[4]  
Bickel D.R., 2019, Genomics Data Analysis: False Discovery Rates and Empirical Bayes Methods
[5]   Correcting false discovery rates for their bias toward false positives [J].
Bickel, David R. ;
Rahal, Abbas .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (11) :3699-3713
[6]   A note on fiducial model averaging as an alternative to checking Bayesian and frequentist models [J].
Bickel, David R. .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2018, 47 (13) :3125-3137
[7]   Statistical modeling: The two cultures [J].
Breiman, L .
STATISTICAL SCIENCE, 2001, 16 (03) :199-215
[8]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[9]   A comparative review of estimates of the proportion unchanged genes and the false discovery rate [J].
Broberg, P .
BMC BIOINFORMATICS, 2005, 6 (1)
[10]   OPTIMAL RATES OF CONVERGENCE FOR ESTIMATING THE NULL DENSITY AND PROPORTION OF NONNULL EFFECTS IN LARGE-SCALE MULTIPLE TESTING [J].
Cai, T. Tony ;
Jin, Jiashun .
ANNALS OF STATISTICS, 2010, 38 (01) :100-145