An Examination of the Replicability of Angoff Standard Setting Results Within a Generalizability Theory Framework

被引:15
作者
Clauser, Jerome C. [1 ]
Margolis, Melissa J. [2 ]
Clauser, Brian E. [2 ]
机构
[1] Amer Board Internal Med, Philadelphia, PA 19106 USA
[2] Natl Board Med Examiners, Measurement Consulting Serv, Philadelphia, PA 19104 USA
关键词
EXAMINEE PERFORMANCE INFORMATION; JUDGMENTS; ERRORS; SCORES; IMPACT;
D O I
10.1111/jedm.12038
中图分类号
G44 [教育心理学];
学科分类号
0402 ; 040202 ;
摘要
Evidence of stable standard setting results over panels or occasions is an important part of the validity argument for an established cut score. Unfortunately, due to the high cost of convening multiple panels of content experts, standards often are based on the recommendation from a single panel of judges. This approach implicitly assumes that the variability across panels will be modest, but little evidence is available to support this assertion. This article examines the stability of Angoff standard setting results across panels. Data were collected for six independent standard setting exercises, with three panels participating in each exercise. The results show that although in some cases the panel effect is negligible, for four of the six data sets the panel facet represented a large portion of the overall error variance. Ignoring the often hidden panel/occasion facet can result in artificially optimistic estimates of the cut score stability. Results based on a single panel should not be viewed as a reasonable estimate of the results that would be found over multiple panels. Instead, the variability seen in a single panel can best be viewed as a lower bound of the expected variability when the exercise is replicated.
引用
收藏
页码:127 / 140
页数:14
相关论文
共 22 条
  • [1] Angoff W.H., 1971, ED MEASUREMENT, V2nd, P508
  • [2] [Anonymous], 1972, The dependability of behavioural measurements: Theory of generalizability for scores and profiles
  • [3] [Anonymous], 1999, STAND ED PSYCH TESTS
  • [4] [Anonymous], 2001, Generalizability Theory
  • [5] Brennan R.L., 1980, APPL PSYCH MEAS, P219, DOI DOI 10.1177/014662168000400209
  • [6] Brennan R.L., 2001, Manual for mGENOVA
  • [7] Brennan R.L., 1995, JOINT C STANDARD SET, P269
  • [8] Camilli G, 2001, SETTING PERFORMANCE STANDARDS, P445
  • [9] Multivariate generalizability analysis of the impact, of training and examinee performance information on judgments made in an Angoff-style standard-setting procedure
    Clauser, BE
    Swanson, DB
    Harik, P
    [J]. JOURNAL OF EDUCATIONAL MEASUREMENT, 2002, 39 (04) : 269 - 290
  • [10] The Effect of Data Format on Integration of Performance Data Into Angoff Judgments
    Clauser, Brian E.
    Mee, Janet
    Margolis, Melissa J.
    [J]. INTERNATIONAL JOURNAL OF TESTING, 2013, 13 (01) : 65 - 85