Statistical significance: p value, 0.05 threshold, and applications to radiomics-reasons for a conservative approach

被引:242
作者
Di Leo, Giovanni [1 ]
Sardanelli, Francesco [1 ,2 ]
机构
[1] IRCCS Policlin San Donato, Radiol Unit, Via Morandi 30, I-20097 San Donato Milanese, Italy
[2] Univ Milan, Dipartimento Sci Biomed Salute, Via Morandi 30, I-20097 San Donato Milanese, Italy
关键词
Confidence intervals; Decision making; Models (statistical); Radiomics; Reproducibility of results;
D O I
10.1186/s41747-020-0145-y
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Here, we summarise the unresolved debate about p value and its dichotomisation. We present the statement of the American Statistical Association against the misuse of statistical significance as well as the proposals to abandon the use of p value and to reduce the significance threshold from 0.05 to 0.005. We highlight reasons for a conservative approach, as clinical research needs dichotomic answers to guide decision-making, in particular in the case of diagnostic imaging and interventional radiology. With a reduced p value threshold, the cost of research could increase while spontaneous research could be reduced. Secondary evidence from systematic reviews/meta-analyses, data sharing, and cost-effective analyses are better ways to mitigate the false discovery rate and lack of reproducibility associated with the use of the 0.05 threshold. Importantly, when reporting p values, authors should always provide the actual value, not only statements of "p < 0.05" or "p >= 0.05", because p values give a measure of the degree of data compatibility with the null hypothesis. Notably, radiomics and big data, fuelled by the application of artificial intelligence, involve hundreds/thousands of tested features similarly to other "omics" such as genomics, where a reduction in the significance threshold, based on well-known corrections for multiple testing, has been already adopted.
引用
收藏
页数:8
相关论文
共 41 条
  • [1] Quantification of Heterogeneity as a Biomarker in Tumor Imaging: A Systematic Review
    Alic, Lejla
    Niessen, Wiro J.
    Veenland, Jifke F.
    [J]. PLOS ONE, 2014, 9 (10):
  • [2] DANGERS OF USING OPTIMAL CUTPOINTS IN THE EVALUATION OF PROGNOSTIC FACTORS
    ALTMAN, DG
    LAUSEN, B
    SAUERBREI, W
    SCHUMACHER, M
    [J]. JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1994, 86 (11) : 829 - 835
  • [3] Retire statistical significance
    Amrhein, Valentin
    Greenland, Sander
    McShane, Blake
    [J]. NATURE, 2019, 567 (7748) : 305 - 307
  • [4] Arnett DK, 2019, CIRCULATION, V140, pE596, DOI [10.1161/CIR.0000000000000678, 10.1161/CIR.0000000000000677, 10.1016/j.jacc.2019.03.009, 10.1016/j.jacc.2019.03.010]
  • [5] DERIVING CHEMOSENSITIVITY FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY
    Baggerly, Keith A.
    Coombes, Kevin R.
    [J]. ANNALS OF APPLIED STATISTICS, 2009, 3 (04) : 1309 - 1334
  • [6] Redefine statistical significance
    Benjamin, Daniel J.
    Berger, James O.
    Johannesson, Magnus
    Nosek, Brian A.
    Wagenmakers, E. -J.
    Berk, Richard
    Bollen, Kenneth A.
    Brembs, Bjoern
    Brown, Lawrence
    Camerer, Colin
    Cesarini, David
    Chambers, Christopher D.
    Clyde, Merlise
    Cook, Thomas D.
    De Boeck, Paul
    Dienes, Zoltan
    Dreber, Anna
    Easwaran, Kenny
    Efferson, Charles
    Fehr, Ernst
    Fidler, Fiona
    Field, Andy P.
    Forster, Malcolm
    George, Edward I.
    Gonzalez, Richard
    Goodman, Steven
    Green, Edwin
    Green, Donald P.
    Greenwald, Anthony
    Hadfield, Jarrod D.
    Hedges, Larry V.
    Held, Leonhard
    Ho, Teck Hua
    Hoijtink, Herbert
    Hruschka, Daniel J.
    Imai, Kosuke
    Imbens, Guido
    Ioannidis, John P. A.
    Jeon, Minjeong
    Jones, James Holland
    Kirchler, Michael
    Laibson, David
    List, John
    Little, Roderick
    Lupia, Arthur
    Machery, Edouard
    Maxwell, Scott E.
    McCarthy, Michael
    Moore, Don
    Morgan, Stephen L.
    [J]. NATURE HUMAN BEHAVIOUR, 2018, 2 (01): : 6 - 10
  • [7] Tests of significance considered as evidence
    Berkson, J
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1942, 37 (219) : 325 - 335
  • [8] An Introduction to Second-Generation p-Values
    Blume, Jeffrey D.
    Greevy, Robert A.
    Welty, Valerie F.
    Smith, Jeffrey R.
    Dupont, William D.
    [J]. AMERICAN STATISTICIAN, 2019, 73 : 157 - 167
  • [9] MATHEMATICAL VS. SCIENTIFIC SIGNIFICANCE
    Boring, Edwin G.
    [J]. PSYCHOLOGICAL BULLETIN, 1919, 16 (10) : 335 - 338
  • [10] False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review
    Chalkidou, Anastasia
    O'Doherty, Michael J.
    Marsden, Paul K.
    [J]. PLOS ONE, 2015, 10 (05):