Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

被引:1
作者
Houghton, Zachary N. [1 ,2 ]
Kapatsinski, Vsevolod [2 ]
机构
[1] Univ Calif Davis, Dept Linguist, Kerr Hall, Davis, CA 95616 USA
[2] Univ Oregon, Dept Linguist, 1290 Univ Oregon, Eugene, OR 97403 USA
关键词
Mixed-effects models; Logistic regression; Conditional inference trees; Random effects; Misspecification; Individual differences; INDIVIDUAL-DIFFERENCES; R PACKAGE; EXCESS MASS; MISSPECIFICATION; MODELS; ERROR; TESTS; BIAS;
D O I
10.3758/s13428-023-02287-y
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
引用
收藏
页码:5557 / 5587
页数:31
相关论文
共 61 条
  • [1] multimode: An R Package for Mode Assessment
    Ameijeiras-Alonso, Jose
    Crujeiras, Rosa M.
    Rodriguez-Casal, Alberto
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2021, 97 (09): : 1 - 32
  • [2] Random effects structure for confirmatory hypothesis testing: Keep it maximal
    Barr, Dale J.
    Levy, Roger
    Scheepers, Christoph
    Tily, Harry J.
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 2013, 68 (03) : 255 - 278
  • [3] Barth D, 2018, QUANT METH HUMAN SOC, P99, DOI 10.1007/978-3-319-69830-4_6
  • [4] Fitting Linear Mixed-Effects Models Using lme4
    Bates, Douglas
    Maechler, Martin
    Bolker, Benjamin M.
    Walker, Steven C.
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01): : 1 - 48
  • [5] brms: An R Package for Bayesian Multilevel Models Using Stan
    Buerkner, Paul-Christian
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2017, 80 (01): : 1 - 28
  • [6] Burnham K. P., 2002, Model selection and multimodel inference
  • [7] Calibrating the excess mass and dip tests of modality
    Cheng, MY
    Hall, P
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 : 579 - 589
  • [8] Clark R G., 2023, Research Methods in Applied Linguistics, V2, P100044, DOI DOI 10.1016/J.RMAL.2023.100044
  • [9] Different speakers, different grammars Individual differences in native language attainment
    Dabrowska, Ewa
    [J]. LINGUISTIC APPROACHES TO BILINGUALISM, 2012, 2 (03) : 219 - 253
  • [10] Empirical logit analysis is not logistic regression
    Donnelly, Seamus
    Verkuilen, Jay
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 2017, 94 : 28 - 42