The generalizability crisis

被引:429
作者
Yarkoni, Tal [1 ]
机构
[1] Univ Texas Austin, Dept Psychol, Austin, TX 78712 USA
基金
英国工程与自然科学研究理事会;
关键词
Generalization; inference; philosophy of science; psychology; random effects; statistics; REGISTERED REPLICATION REPORT; GENETIC ASSOCIATIONS; THEORETICAL RISKS; SOCIAL-PSYCHOLOGY; T TESTS; SCIENCE; PERSPECTIVES; STATISTICS; POWER; POLYMORPHISM;
D O I
10.1017/S0140525X20001685
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned - that is, that the two must refer to roughly the same set of hypothetical observations. Here, I argue that many applications of statistical inference in psychology fail to meet this basic condition. Focusing on the most widely used class of model in psychology - the linear mixed model - I explore the consequences of failing to statistically operationalize verbal hypotheses in a way that respects researchers' actual generalization intentions. I demonstrate that although the "random effect" formalism is used pervasively in psychology to model intersubject variability, few researchers accord the same treatment to other variables they clearly intend to generalize over (e.g., stimuli, tasks, or research sites). The under-specification of random effects imposes far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints can dramatically inflate false-positive rates, and often leads researchers to draw sweeping verbal generalizations that lack a meaningful connection to the statistical quantities they are putatively based on. I argue that failure to take the alignment between verbal and statistical expressions seriously lies at the heart of many of psychology's ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement.
引用
收藏
页数:16
相关论文
共 113 条
[1]   Registered Replication Report: Schooler and Engstler-Schooler (1990) [J].
Alogna, V. K. ;
Attaya, M. K. ;
Aucoin, P. ;
Bahnik, S. ;
Birch, S. ;
Birt, A. R. ;
Bornstein, B. H. ;
Bouwmeester, S. ;
Brandimonte, M. A. ;
Brown, C. ;
Buswell, K. ;
Carlson, C. ;
Carlson, M. ;
Chu, S. ;
Cislak, A. ;
Colarusso, M. ;
Colloff, M. F. ;
Dellapaolera, K. S. ;
Delvenne, J.-F. ;
Di Domenico, A. ;
Drummond, A. ;
Echterhoff, G. ;
Edlund, J. E. ;
Eggleston, C. M. ;
Fairfield, B. ;
Franco, G. ;
Gabbert, F. ;
Gamblin, B. W. ;
Garry, M. ;
Gentry, R. ;
Gilbert, E. A. ;
Greenberg, D. L. ;
Halberstadt, J. ;
Hall, L. ;
Hancock, P. J. B. ;
Hirsch, D. ;
Holt, G. ;
Jackson, J. C. ;
Jong, J. ;
Kehn, A. ;
Koch, C. ;
Kopietz, R. ;
Koerner, U. ;
Kunar, M. A. ;
Lai, C. K. ;
Langton, S. R. H. ;
Leite, F. P. ;
Mammarella, N. ;
Marsh, J. E. ;
McConnaughy, K. A. .
PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2014, 9 (05) :556-578
[2]  
[Anonymous], 2016, What if there were no significance tests?
[3]   Mixed-effects modeling with crossed random effects for subjects and items [J].
Baayen, R. H. ;
Davidson, D. J. ;
Bates, D. M. .
JOURNAL OF MEMORY AND LANGUAGE, 2008, 59 (04) :390-412
[4]  
Balota DA, 2012, CUR ISS PSYCHOL LANG, P90
[5]   Metastudies for robust tests of theory [J].
Baribault, Beth ;
Donkin, Chris ;
Little, Daniel R. ;
Trueblood, Jennifer S. ;
Oravecz, Zita ;
van Ravenzwaaij, Don ;
White, Corey N. ;
De Boeck, Paul ;
Vandekerckhove, Joachim .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (11) :2607-2612
[6]   Random effects structure for confirmatory hypothesis testing: Keep it maximal [J].
Barr, Dale J. ;
Levy, Roger ;
Scheepers, Christoph ;
Tily, Harry J. .
JOURNAL OF MEMORY AND LANGUAGE, 2013, 68 (03) :255-278
[7]  
Bates D., 2014, LME4 LINEAR MIXEDEFF, DOI DOI 10.18637/JSS.V067.I01
[8]   Redefine statistical significance [J].
Benjamin, Daniel J. ;
Berger, James O. ;
Johannesson, Magnus ;
Nosek, Brian A. ;
Wagenmakers, E. -J. ;
Berk, Richard ;
Bollen, Kenneth A. ;
Brembs, Bjoern ;
Brown, Lawrence ;
Camerer, Colin ;
Cesarini, David ;
Chambers, Christopher D. ;
Clyde, Merlise ;
Cook, Thomas D. ;
De Boeck, Paul ;
Dienes, Zoltan ;
Dreber, Anna ;
Easwaran, Kenny ;
Efferson, Charles ;
Fehr, Ernst ;
Fidler, Fiona ;
Field, Andy P. ;
Forster, Malcolm ;
George, Edward I. ;
Gonzalez, Richard ;
Goodman, Steven ;
Green, Edwin ;
Green, Donald P. ;
Greenwald, Anthony ;
Hadfield, Jarrod D. ;
Hedges, Larry V. ;
Held, Leonhard ;
Ho, Teck Hua ;
Hoijtink, Herbert ;
Hruschka, Daniel J. ;
Imai, Kosuke ;
Imbens, Guido ;
Ioannidis, John P. A. ;
Jeon, Minjeong ;
Jones, James Holland ;
Kirchler, Michael ;
Laibson, David ;
List, John ;
Little, Roderick ;
Lupia, Arthur ;
Machery, Edouard ;
Maxwell, Scott E. ;
McCarthy, Michael ;
Moore, Don ;
Morgan, Stephen L. .
NATURE HUMAN BEHAVIOUR, 2018, 2 (01) :6-10
[9]  
Bergelson E., 2017, QUANTIFYING SOURCES
[10]   The theoretical status of latent variables [J].
Borsboom, D ;
Mellenbergh, GJ ;
van Heerden, J .
PSYCHOLOGICAL REVIEW, 2003, 110 (02) :203-219