Statistical inference for diagnostic test accuracy studies with multiple comparisons

被引:0
作者
Westphal, Max [1 ,3 ]
Zapf, Antonia [2 ]
机构
[1] Fraunhofer Inst Digital Med MEVIS, Bremen, Germany
[2] Univ Med Ctr Hamburg Eppendorf, Dept Med Biometry & Epidemiol, Hamburg, Germany
[3] Fraunhofer MEVIS, Max von Laue Str 2, D-28359 Bremen, Germany
关键词
Diagnosis; medical testing; multiple testing; model selection; prediction; prognosis; NUCLEAR FEATURES; BREAST-CANCER; REGULARIZATION; MODELS;
D O I
10.1177/09622802241236933
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.
引用
收藏
页码:669 / 680
页数:12
相关论文
共 28 条
[1]   Introduction to statistical simulations in health research [J].
Boulesteix, Anne-Laure ;
Groenwold, Rolf H. H. ;
Abrahamowicz, Michal ;
Binder, Harald ;
Briel, Matthias ;
Hornung, Roman ;
Morris, Tim P. ;
Rahnenfuhrer, Jorg ;
Sauerbrei, Willi .
BMJ OPEN, 2020, 10 (12)
[2]   Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction [J].
Boulesteix, Anne-Laure ;
Strobl, Carolin .
BMC MEDICAL RESEARCH METHODOLOGY, 2009, 9
[3]   On the use of comparison regions in visualizing stochastic uncertainty in some two-parameter estimation problems [J].
Eckert, Maren ;
Vach, Werner .
BIOMETRICAL JOURNAL, 2020, 62 (03) :598-609
[4]  
European Medicines Agency CHMP, 2010, GUIDELINE CLIN EVALU
[5]   Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap [J].
Flachaire, E .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 49 (02) :361-376
[6]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[7]   Simultaneous inference in general parametric models [J].
Hothorn, Torsten ;
Bretz, Frank ;
Westfall, Peter .
BIOMETRICAL JOURNAL, 2008, 50 (03) :346-363
[8]   Over-optimism in bioinformatics: an illustration [J].
Jelizarow, Monika ;
Guillemot, Vincent ;
Tenenhaus, Arthur ;
Strimmer, Korbinian ;
Boulesteix, Anne-Laure .
BIOINFORMATICS, 2010, 26 (16) :1990-1998
[9]   Rank-based multiple test procedures and simultaneous confidence intervals [J].
Konietschke, Frank ;
Hothorn, Ludwig A. .
ELECTRONIC JOURNAL OF STATISTICS, 2012, 6 :738-759
[10]   Targeted test evaluation: a framework for designing diagnostic accuracy studies with clear study hypotheses [J].
Daniël A. Korevaar ;
Gowri Gopalakrishna ;
Jérémie F. Cohen ;
Patrick M. Bossuyt .
Diagnostic and Prognostic Research, 3 (1)