Comparative judgement as a research tool: A meta-analysis of application and reliability

被引:0
作者
Kinnear, George [1 ,2 ]
Jones, Ian [3 ]
Davies, Ben [4 ]
机构
[1] Univ Edinburgh, Sch Math, Edinburgh, Scotland
[2] Univ Edinburgh, Maxwell Inst Math Sci, Edinburgh, Scotland
[3] Loughborough Univ, Loughborough, England
[4] Univ Southampton, Southampton, England
关键词
Comparative judgement; Bradley-Terry; Reliability;
D O I
10.3758/s13428-025-02744-w
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
Comparative judgement (CJ) provides methods for constructing measurement scales, by asking assessors to make a series of pairwise comparisons of the artefacts or representations to be scored. Researchers using CJ need to decide how many assessors to recruit and how many comparisons to collect. They also need to gauge the reliability of the resulting measurement scale, with two different estimates in widespread use: scale separation reliability (SSR) and split-halves reliability (SHR). Previous research has offered guidance on these issues, but with either limited empirical support or focused only on education research. In this paper, we offer guidance based on our analysis of 101 CJ datasets that we collated from previous research across a range of disciplines. We present two novel findings, with substantive implications for future CJ research. First, we find that collecting ten comparisons for every representation is generally sufficient; a more lenient guideline than previously published. Second, we conclude that SSR can serve as a reliable proxy for inter-rater reliability, but recommend that researchers use a higher threshold of .8, rather than the current standard of .7.
引用
收藏
页数:16
相关论文
共 41 条
[1]   A systematized review of research with adaptive comparative judgment (ACJ) in higher education [J].
Bartholomew, Scott R. ;
Jones, Matthew D. .
INTERNATIONAL JOURNAL OF TECHNOLOGY AND DESIGN EDUCATION, 2022, 32 (02) :1159-1190
[2]  
Bisson M-J., 2016, International Journal of Research in Undergraduate Mathematics Education, V2, P141, DOI DOI 10.1007/S40753-016-0024-3
[3]  
Bond TG., 2015, Applying the Rasch model: Fundamental measurement in the human sciences, V3rd, DOI [DOI 10.4324/9781315814698, 10.4324/9781315814698]
[4]  
BRADLEY RA, 1952, BIOMETRIKA, V39, P324, DOI 10.1093/biomet/39.3-4.324
[5]  
Bramley T., 2007, Techniques for Monitoring the Comparability of Examination Standards, P264
[6]  
Bramley T., 2015, tech. rep.)
[7]   The effect of adaptivity on the reliability coefficient in adaptive comparative judgement [J].
Bramley, Tom ;
Vitello, Sylvia .
ASSESSMENT IN EDUCATION-PRINCIPLES POLICY & PRACTICE, 2019, 26 (01) :43-58
[8]   Dynamic Bradley-Terry modelling of sports tournaments [J].
Cattelan, Manuela ;
Varin, Cristiano ;
Firth, David .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2013, 62 (01) :135-150
[9]  
Crompvoets E. A. V., 2021, On the bias and stability of the results of comparative judgment, DOI [10.31234/osf.io/kq4jh, DOI 10.31234/OSF.IO/KQ4JH]
[10]   Do mathematicians and undergraduates agree about explanation quality? [J].
Evans, Tanya ;
Mejia-Ramos, Juan Pablo ;
Inglis, Matthew .
EDUCATIONAL STUDIES IN MATHEMATICS, 2022, 111 (03) :445-467