Workflow Techniques for the Robust Use of Bayes Factors

被引:60
作者
Schad, Daniel J. [1 ,2 ,3 ]
Nicenboim, Bruno [2 ,3 ]
Burkner, Paul-Christian [4 ]
Betancourt, Michael [5 ]
Vasishth, Shravan [3 ]
机构
[1] Hlth & Med Univ Potsdam, Dept Psychol, Potsdam, Germany
[2] Tilburg Univ, Dept Cognit Sci & Artificial Intelligence, Tilburg, Netherlands
[3] Univ Potsdam, Dept Linguist, Potsdam, Germany
[4] Univ Stuttgart, Cluster Excellence SimTech, Stuttgart, Germany
[5] Symplectomorphic, New York, NY USA
关键词
Bayes factors; Bayesian model comparison; prior; posterior; simulation-based calibration; AGREEMENT ATTRACTION; PRIOR SENSITIVITY; SPECIAL-ISSUE; TUTORIAL; TESTS;
D O I
10.1037/met0000472
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Inferences about hypotheses are ubiquitous in the cognitive sciences. Bayes factors provide one general way to compare different hypotheses by their compatibility with the observed data. Those quantifications can then also be used to choose between hypotheses. While Bayes factors provide an immediate approach to hypothesis testing, they are highly sensitive to details of the data/model assumptions and it's unclear whether the details of the computational implementation (such as bridge sampling) are unbiased for complex analyses. Hem, we study how Bayes factors misbehave under different conditions. This includes a study of errors in the estimation of Bayes factors; the first-ever use of simulation-based calibration to test the accuracy and bias of Bayes factor estimates using bridge sampling; a study of the stability of Bayes factors against different MCMC draws and sampling variation in the data; and a look at the variability of decisions based on Bayes factors using a utility function. We outline a Bayes factor workflow that researchers can use to study whether Bayes factors are robust for their individual analysis. Reproducible code is available from haps://osf.io/y354c/. Translational Abstract In psychology and related areas, scientific hypotheses are commonly tested by asking questions like "is [some] effect present or absent." Such hypothesis testing is most often carried out using frequentist null hypothesis significance testing (NIIST). The NHST procedure is very simple: It usually returns a p-value, which is then used to make binary decisions like "the effect is present/abscnt." For example, it is common to see studies in the media that draw simplistic conclusions like "coffee causes cancer," or "coffee reduces the chances of geuing cancer." However, a powerful and more nuanced alternative approach exists: Bayes factors. Bayes factors have many advantages over NHST. However, for the complex statistical models that arc commonly used for data analysis today, computing Bayes factors is not at all a simple matter. In this article, we discuss the main complexities associated with computing Bayes factors. This is the first article to provide a detailed workflow for understanding and computing Bayes factors in complex statistical models. The article provides a statistically more nuanced way to think about hypothesis testing than the overly simplistic tendency to declare effects as being "present" or "absent".
引用
收藏
页码:1404 / 1426
页数:23
相关论文
共 84 条
[1]  
AITKIN M, 1991, J ROY STAT SOC B MET, V53, P111
[2]  
[Anonymous], 2006, Cognition and multi-agent interaction: From cognitive modeling to social simulation
[3]  
[Anonymous], 2017, ARXIV170102434
[4]  
[Anonymous], **DATA OBJECT**
[5]   Does case marking affect agreement attraction in comprehension? [J].
Avetisyan, Serine ;
Lago, Sol ;
Vasishth, Shravan .
JOURNAL OF MEMORY AND LANGUAGE, 2020, 112
[6]   Random effects structure for confirmatory hypothesis testing: Keep it maximal [J].
Barr, Dale J. ;
Levy, Roger ;
Scheepers, Christoph ;
Tily, Harry J. .
JOURNAL OF MEMORY AND LANGUAGE, 2013, 68 (03) :255-278
[7]   Redefine statistical significance [J].
Benjamin, Daniel J. ;
Berger, James O. ;
Johannesson, Magnus ;
Nosek, Brian A. ;
Wagenmakers, E. -J. ;
Berk, Richard ;
Bollen, Kenneth A. ;
Brembs, Bjoern ;
Brown, Lawrence ;
Camerer, Colin ;
Cesarini, David ;
Chambers, Christopher D. ;
Clyde, Merlise ;
Cook, Thomas D. ;
De Boeck, Paul ;
Dienes, Zoltan ;
Dreber, Anna ;
Easwaran, Kenny ;
Efferson, Charles ;
Fehr, Ernst ;
Fidler, Fiona ;
Field, Andy P. ;
Forster, Malcolm ;
George, Edward I. ;
Gonzalez, Richard ;
Goodman, Steven ;
Green, Edwin ;
Green, Donald P. ;
Greenwald, Anthony ;
Hadfield, Jarrod D. ;
Hedges, Larry V. ;
Held, Leonhard ;
Ho, Teck Hua ;
Hoijtink, Herbert ;
Hruschka, Daniel J. ;
Imai, Kosuke ;
Imbens, Guido ;
Ioannidis, John P. A. ;
Jeon, Minjeong ;
Jones, James Holland ;
Kirchler, Michael ;
Laibson, David ;
List, John ;
Little, Roderick ;
Lupia, Arthur ;
Machery, Edouard ;
Maxwell, Scott E. ;
McCarthy, Michael ;
Moore, Don ;
Morgan, Stephen L. .
NATURE HUMAN BEHAVIOUR, 2018, 2 (01) :6-10
[8]   EFFICIENT ESTIMATION OF FREE-ENERGY DIFFERENCES FROM MONTE-CARLO DATA [J].
BENNETT, CH .
JOURNAL OF COMPUTATIONAL PHYSICS, 1976, 22 (02) :245-268
[9]  
Betancourt M., 2016, DIAGNOSING SUBOPTIMA, DOI DOI 10.48550/ARXIV.1604.00695
[10]  
Betancourt M., 2018, CALIBRATING MODEL BA