Multilevel Reliability Measures of Latent Scores Within an Item Response Theory Framework

被引:7
作者
Cho, Sun-Joo [1 ]
Shen, Jianhong [2 ]
Naveiras, Matthew [1 ]
机构
[1] Vanderbilt Univ, Nashville, TN 37203 USA
[2] axialHealthcare, Nashville, TN USA
关键词
Bayesian analysis; item response theory; marginal maximum likelihood estimation; multilevel model; multiple imputation; reliability coefficient; MAXIMUM-LIKELIHOOD-ESTIMATION; CROSS-LEVEL MEASUREMENT; SIGNAL NOISE RATIO; MEASUREMENT INVARIANCE; INFORMATION FUNCTION; IRT; ABILITY; MODEL; PARAMETER; ACCURACY;
D O I
10.1080/00273171.2019.1596780
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.
引用
收藏
页码:856 / 881
页数:26
相关论文
共 95 条
  • [1] American Education Research Association American Psychological Association and the National Council on Measurement in Education, 2014, Standards for educational and psychological testing, V2nd ed
  • [2] [Anonymous], 2017, FLEXMIRT VERSION 3 5
  • [3] [Anonymous], 1951, Psychometrika, DOI [10.1007/bf02310555, DOI 10.1007/BF02310555]
  • [4] Asparouhov T., 2016, IRT in Mplus
  • [5] Baker F. B., 2004, Item response theory: Parameter estimation techniques
  • [6] ADAPTIVE EAP ESTIMATION OF ABILITY IN A MICROCOMPUTER ENVIRONMENT
    BOCK, RD
    MISLEVY, RJ
    [J]. APPLIED PSYCHOLOGICAL MEASUREMENT, 1982, 6 (04) : 431 - 444
  • [7] MARGINAL MAXIMUM-LIKELIHOOD ESTIMATION OF ITEM PARAMETERS - APPLICATION OF AN EM ALGORITHM
    BOCK, RD
    AITKIN, M
    [J]. PSYCHOMETRIKA, 1981, 46 (04) : 443 - 459
  • [8] Impact of Enhanced Anchored Instruction in Inclusive Math Classrooms
    Bottge, Brian A.
    Toland, Michael D.
    Gassaway, Linda
    Butler, Mark
    Choo, Sam
    Griffen, Ann Katherine
    Ma, Xin
    [J]. EXCEPTIONAL CHILDREN, 2015, 81 (02) : 158 - 175
  • [9] ROBUSTNESS
    BRADLEY, JV
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1978, 31 (NOV) : 144 - 152
  • [10] The conventional wisdom about group mean scores
    Brennan, RL
    [J]. JOURNAL OF EDUCATIONAL MEASUREMENT, 1995, 32 (04) : 385 - 396