The differences among three-, four-, and five-option-item formats in the context of a high-stakes English-language listening test

被引:15
|
作者
Lee, HyeSun [1 ]
Winke, Paula [2 ]
机构
[1] Univ Nebraska Lincoln, Lincoln, NE USA
[2] Michigan State Univ, Dept Linguist & Languages, E Lansing, MI 48824 USA
关键词
Educational measurement; high-stakes testing; item formatting; multiple-choice testing; MULTIPLE-CHOICE TESTS; OPTIMAL NUMBER; TEST-WISENESS; TEST ITEM; OPTIONS; RELIABILITY; ALTERNATIVES; 4-CHOICE; VALIDITY; 3-OPTION;
D O I
10.1177/0265532212451235
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed three groups. Each took three of the nine tests, one with five-option items, one with four-, and one with three-, with administrations counterbalanced to control for order and practice effects. Mean test scores of the three-option tests were significantly higher than those of four- and five-option tests. While no difference was found in mean item discriminations across the three different test formats, reliability coefficients showed inconsistent patterns depending on the number of options and test versions. One possible interpretation of the low correlations among the scores of three test formats is that items with different numbers of options tap into skills other than listening. The findings suggest that statistically, three options may or may not be optimal depending on the point of view taken - from that of the test score users, or from that of the test stakeholders. Test developers must consider multiple statistical, affective, and contextual factors in determining the optimal number of options.
引用
收藏
页码:99 / 123
页数:25
相关论文
empty
未找到相关数据