Revisiting rating scale development for rater-mediated language performance assessments: Modelling construct and contextual choices made by scale developers

被引:12
作者
Knoch, Ute [1 ]
Deygers, Bart [2 ]
Khamboonruang, Apichat [3 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] Katholieke Univ Leuven, Leuven, Belgium
[3] Mahasarakham Univ, Talat, Thailand
关键词
Rating scale development; rating scale validation; rubric development; scale design; sources of scale construct;
D O I
10.1177/0265532221994052
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing research for over a decade. In this paper we refine the dominant model of rating scale development by drawing on a corpus of 36 studies identified in a systematic review. We present a model showing the different sources of scale construct in the corpus. In the discussion, we argue that rating scale designers, just like test developers more broadly, need to start by determining the purpose of the test, the relevant policies that guide test development and score use, and the intended score use when considering the design choices available to them. These include considering the impact of such sources on the generalizability of the scores, the precision of the post-test predictions that can be made about test takers' future performances and scoring reliability. The most important contributions of the model are that it gives rating scale developers a framework to consider prior to starting scale development and validation activities.
引用
收藏
页码:602 / 626
页数:25
相关论文
共 50 条
[41]  
Newman M, 2020, SYSTEMATIC REVIEWS IN EDUCATIONAL RESEARCH: METHODOLOGY, PERSPECTIVES AND APPLICATION, P3, DOI 10.1007/978-3-658-27602-7_1
[42]   Writing Scale Development and Use Within a Language Program [J].
Plakans, Lia .
TESOL JOURNAL, 2013, 4 (01) :151-163
[43]  
Shohamy, 2017, LANGUAGE TESTING ASS, P179
[44]  
Shohamy E., 2006, LANGUAGE POLICY HIDD
[45]  
Spolsky B., 2004, Language policy, DOI DOI 10.1017/CBO9780511615245
[46]   Assessing cohesion in children's writing: Development of a checklist [J].
Struthers, Lynda ;
Lapadat, Judith C. ;
MacMillan, Peter D. .
ASSESSING WRITING, 2013, 18 (03) :187-201
[47]  
Tanko G., 2005, WRITING HDB
[48]  
Toulmin SE., 2003, The uses of argument, V2
[49]  
Upshur J.A., 1995, ENGLISH LANGUAGE TEA, V49, P3, DOI [10.1093/elt/49.1.3, DOI 10.1093/ELT/49.1.3]
[50]   Validity argument for assessing L2 pragmatics in interaction using mixed methods [J].
Youn, Soo Jung .
LANGUAGE TESTING, 2015, 32 (02) :199-225