A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size

被引:32
作者
Feng, Dai [1 ]
Cortese, Giuliana [2 ]
Baumgartner, Richard [1 ]
机构
[1] Merck Res Lab, Biometr Res, Rahway, NJ USA
[2] Univ Padua, Dept Stat Sci, Padua, Italy
关键词
AUC; small sample size; Mann-Whitney; empirical likelihood; kernel smoothing; jackknife; bootstrap; profile likelihood; Wald statistic; signed log-likelihood ratio statistic; higher order asymptotic; Behrens-Fisher problem; Bayesian MCMC; OPERATING CHARACTERISTIC CURVES; LIKELIHOOD INFERENCE; BANDWIDTH SELECTION; MEASUREMENT ERROR; VARIANCE TEST; PET;
D O I
10.1177/0962280215602040
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
The receiver operating characteristic (ROC) curve is frequently used as a measure of accuracy of continuous markers in diagnostic tests. The area under the ROC curve (AUC) is arguably the most widely used summary index for the ROC curve. Although the small sample size scenario is common in medical tests, a comprehensive study of small sample size properties of various methods for the construction of the confidence/credible interval (CI) for the AUC has been by and large missing in the literature. In this paper, we describe and compare 29 non-parametric and parametric methods for the construction of the CI for the AUC when the number of available observations is small. The methods considered include not only those that have been widely adopted, but also those that have been less frequently mentioned or, to our knowledge, never applied to the AUC context. To compare different methods, we carried out a simulation study with data generated from binormal models with equal and unequal variances and from exponential models with various parameters and with equal and unequal small sample sizes. We found that the larger the true AUC value and the smaller the sample size, the larger the discrepancy among the results of different approaches. When the model is correctly specified, the parametric approaches tend to outperform the non-parametric ones. Moreover, in the non-parametric domain, we found that a method based on the Mann-Whitney statistic is in general superior to the others. We further elucidate potential issues and provide possible solutions to along with general guidance on the CI construction for the AUC when the sample size is small. Finally, we illustrate the utility of different methods through real life examples.
引用
收藏
页码:2603 / 2621
页数:19
相关论文
共 49 条
  • [1] [Anonymous], 1993, An introduction to the bootstrap
  • [2] AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH
    BAMBER, D
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) : 387 - 415
  • [3] AN ANALYSIS OF TRANSFORMATIONS
    BOX, GEP
    COX, DR
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1964, 26 (02) : 211 - 252
  • [4] Brazzale AR, 2007, APPL ASYMPTOTICS CAS, V23
  • [5] Casella G., 2002, STAT INFERENCE, V2
  • [6] Clinical applications of PET in brain tumors
    Chen, Wei
    [J]. JOURNAL OF NUCLEAR MEDICINE, 2007, 48 (09) : 1468 - 1481
  • [7] Cortese G, 2013, COMPUTATION STAT, V28, P1035, DOI 10.1007/s00180-012-0343-z
  • [8] COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH
    DELONG, ER
    DELONG, DM
    CLARKEPEARSON, DI
    [J]. BIOMETRICS, 1988, 44 (03) : 837 - 845
  • [9] Efron B., 1982, The jackknife, the bootstrap and other resampling plans, DOI DOI 10.1137/1.9781611970319
  • [10] Comparison of 18F-Fluorodeoxyglucose and 18F-Fluorothymidine PET in Differentiating Radiation Necrosis From Recurrent Glioma
    Enslow, Michael S.
    Zollinger, Lauren V.
    Morton, Kathryn A.
    Butterfield, Regan I.
    Kadrmas, Dan J.
    Christian, Paul E.
    Boucher, Kenneth M.
    Heilbrun, Marta E.
    Jensen, Randy L.
    Hoffman, John M.
    [J]. CLINICAL NUCLEAR MEDICINE, 2012, 37 (09) : 854 - 861