The stability of rater severity in large-scale assessment programs

被引:63
作者
Congdon, PJ
McQueen, J
机构
关键词
D O I
10.1111/j.1745-3984.2000.tb01081.x
中图分类号
G44 [教育心理学];
学科分类号
0402 ; 040202 ;
摘要
The purpose of this study was to investigate the stability of rater severity over an extended rating period. Multifaceted Rasch analysis was applied to ratings of 16 raters on writing performances of 8,285 elementary school students. Each performance was rated by two trained raters over a period of seven rating days. Performances rated on the first day were re-rated at the end of the rating period. Statistically significant differences between raters were found within each day and in all days combined. Daily estimates of the relative severity of individual raters were found to differ significantly from single, on-average estimates for the whole rating period. For 10 raters, severity estimates on the last day were significantly different from estimates on the first day. These findings cast doubt on the practice of using a single calibration of rater severity as the basis for adjustment of person measures.
引用
收藏
页码:163 / 178
页数:16
相关论文
共 34 条
[1]  
[Anonymous], OBJECTIVE MEASUREMEN
[2]   RESEARCHING PRACTICE, EVALUATING ASSESSMENT ESSAYS [J].
BARRITT, L ;
STOCK, PL ;
CLARK, F .
COLLEGE COMPOSITION AND COMMUNICATION, 1986, 37 (03) :315-327
[3]   THE PREDICTABILITY OF RATINGS AS A FUNCTION OF INTERRATER AGREEMENT [J].
BUCKNER, DN .
JOURNAL OF APPLIED PSYCHOLOGY, 1959, 43 (01) :60-64
[4]  
CANTOR NK, 1986, ANN M AM ED RES ASS
[5]  
COFFMAN WE, 1968, AM EDUC RES J, V5, P101
[6]  
Diederich P. B., 1961, RES B, V61-15
[7]  
DUNBAR SB, 1991, APPLIED MEASUREMENT, V4, P289, DOI DOI 10.1207/s15324818ame0404_3
[9]  
ENGELHARD G, 1991, RES TEACH ENGL, V26, P315
[10]  
Engelhard Jr G., 1992, APPLIED MEASUREMENT, V5, P171, DOI DOI 10.1207/S15324818AME0503_1