The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters

被引:57
作者
Lim, Gad S. [1 ]
机构
[1] Univ Cambridge ESOL Examinat, Cambridge, England
关键词
inter-/intra-rater reliability; multi-faceted Rasch analysis; rater development; rater expertise; rater training; writing assessment; FRAMEWORK; TESTS; SCALE;
D O I
10.1177/0265532211406422
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Raters are central to writing performance assessment, and rater development - training, experience, and expertise - involves a temporal dimension. However, few studies have examined new and experienced raters' rating performance longitudinally over multiple time points. This study uses operational data from the writing section of the MELAB (n = 20,662 ratings), an international exam of English proficiency, to investigate the rating quality of new and experienced raters over three time periods of 12 to 21 months. Rating quality was operationalized in terms of rater severity and consistency, and estimates of those modeled using multi-facet Rasch methodology. Results indicate that, within one particular rating context, (1) novice raters, where initially differing in performance, learn to rate appropriately relatively quickly, (2) raters are able to maintain rating quality over time, and (3) rating volume and rating quality may be related. Implications for rater preparation, rater certification, and the notion of expert rater are discussed.
引用
收藏
页码:543 / 560
页数:18
相关论文
共 47 条
[1]  
[Anonymous], 2010, SPANN FELLOW WORKING
[2]   Participants, texts, and processes in ESL/EFL essay tests: A narrative review of the literature [J].
Barkaoui, Khaled .
CANADIAN MODERN LANGUAGE REVIEW-REVUE CANADIENNE DES LANGUES VIVANTES, 2007, 64 (01) :99-134
[3]   Variability in ESL Essay Rating Processes: The Role of the Rating Scale and Rater Experience [J].
Barkaoui, Khaled .
LANGUAGE ASSESSMENT QUARTERLY, 2010, 7 (01) :54-74
[4]  
Bond T, 2021, Applying the Rasch Model: Fundamental Measurement in the Human Sciences, V4th
[5]  
Breland H., 2004, An analysis of ToEFL-CBT writing prompt difficulty and comparability for different gender groups (ETS Research Report 4)
[6]  
BROER M, 2005, 0511 ETS RR
[7]  
Chalmers M., 2003, EUROWEARABLE, P11
[8]  
Cho D., 1999, MELBOURNE PAPERS LAN, V8, P1
[9]   The stability of rater severity in large-scale assessment programs [J].
Congdon, PJ ;
McQueen, J .
JOURNAL OF EDUCATIONAL MEASUREMENT, 2000, 37 (02) :163-178
[10]   Decision making while rating ESL/EFL writing tasks: A descriptive framework [J].
Cumming, A ;
Kantor, R ;
Powers, DE .
MODERN LANGUAGE JOURNAL, 2002, 86 (01) :67-96