Computation of the distribution of model accuracy statistics in machine learning: Comparison between analytically derived distributions and simulation-based methods

被引:19
作者
Huang, Alexander A. [1 ]
Huang, Samuel Y. [2 ]
机构
[1] Northwestern Univ, Feinberg Sch Med, Chicago, IL USA
[2] Virginia Commonwealth Univ, Virginia Commonwealth Sch Med, Richmond, VA 23298 USA
关键词
Anderson-Darling; bootstrap; Gaussian distribution; normal distribution; simulation; sufficient statistics; variance calculations; Whitney-Mann; CURVES;
D O I
10.1002/hsr2.1214
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background and AimsAll fields have seen an increase in machine-learning techniques. To accurately evaluate the efficacy of novel modeling methods, it is necessary to conduct a critical evaluation of the utilized model metrics, such as sensitivity, specificity, and area under the receiver operator characteristic curve (AUROC). For commonly used model metrics, we proposed the use of analytically derived distributions (ADDs) and compared it with simulation-based approaches. MethodsA retrospective cohort study was conducted using the England National Health Services Heart Disease Prediction Cohort. Four machine learning models (XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boost) were used. The distribution of the model metrics and covariate gain statistics were empirically derived using boot-strap simulation (N = 10,000). The ADDs were created from analytic formulas from the covariates to describe the distribution of the model metrics and compared with those of bootstrap simulation. ResultsXGBoost had the most optimal model having the highest AUROC and the highest aggregate score considering six other model metrics. Based on the Anderson-Darling test, the distribution of the model metrics created from bootstrap did not significantly deviate from a normal distribution. The variance created from the ADD led to smaller SDs than those derived from bootstrap simulation, whereas the rest of the distribution remained not statistically significantly different. ConclusionsADD allows for cross study comparison of model metrics, which is usually done with bootstrapping that rely on simulations, which cannot be replicated by the reader.
引用
收藏
页数:9
相关论文
共 39 条
  • [1] Irregular Shaped Small Nodule Detection Using a Robust Scan Statistic
    Abolhassani, Ali
    Prates, Marcos O.
    Mahmoodi, Safieh
    [J]. STATISTICS IN BIOSCIENCES, 2023, 15 (01) : 141 - 162
  • [2] Machine learning for predicting neurodegenerative diseases in the general older population: a cohort study
    Aguayo, Gloria A.
    Zhang, Lu
    Vaillant, Michel
    Ngari, Moses
    Perquin, Magali
    Moran, Valerie
    Huiart, Laetitia
    Krueger, Rejko
    Azuaje, Francisco
    Ferdynus, Cyril
    Fagherazzi, Guy
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)
  • [3] Building and analyzing machine learning-based warfarin dose prediction models using scikit-learn
    Ahn, Sangzin
    [J]. TRANSLATIONAL AND CLINICAL PHARMACOLOGY, 2022, 30 (04) : 172 - 181
  • [4] Analysis of partially observed clustered data using generalized estimating equations and multiple imputation
    Aloisio, Kathryn M.
    Micali, Nadia
    Swanson, Sonja A.
    Field, Alison
    Horton, Nicholas J.
    [J]. STATA JOURNAL, 2014, 14 (04) : 863 - 883
  • [5] Borenstein Michael, 2022, J Clin Epidemiol, V152, P281, DOI 10.1016/j.jclinepi.2022.10.003
  • [6] A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms
    Carrington, Andre M.
    Fieguth, Paul W.
    Qazi, Hammad
    Holzinger, Andreas
    Chen, Helen H.
    Mayr, Franz
    Manuel, Douglas G.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [7] Differences in structural connectivity between diabetic and psychological erectile dysfunction revealed by network-based statistic: A diffusion tensor imaging study
    Chen, Jianhuai
    Wu, Jindan
    Huang, Xinfei
    Sun, Rui
    Xiang, Ziliang
    Xu, Yan
    Chen, Shi
    Xu, Weilong
    Yang, Jie
    Chen, Yun
    [J]. FRONTIERS IN ENDOCRINOLOGY, 2022, 13
  • [8] Advances in CD-CAT: The General Nonparametric Item Selection Method
    Chiu, Chia-Yi
    Chang, Yuan-Pei
    [J]. PSYCHOMETRIKA, 2021, 86 (04) : 1039 - 1057
  • [9] The power-law distribution in the geometrically growing system: Statistic of the COVID-19 pandemic
    Chol-jun, Kim
    [J]. CHAOS, 2022, 32 (01)
  • [10] A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes
    Clark, Lindsay, V
    Mays, Wittney
    Lipka, Alexander E.
    Sacks, Erik J.
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)