Crowdsourcing the Evaluation of Multiple-Choice Questions Using Item-Writing Flaws and Bloom's Taxonomy

被引:2
|
作者
Moore, Steven [1 ]
Fang, Ellen [1 ]
Nguyen, Huy A. [1 ]
Stamper, John [1 ]
机构
[1] Carnegie Mellon Univ, Human Comp Interact, Pittsburgh, PA 15213 USA
来源
PROCEEDINGS OF THE TENTH ACM CONFERENCE ON LEARNING @ SCALE, L@S 2023 | 2023年
关键词
crowdsourcing; learnersourcing; question evaluation; question quality; question generation; QUALITY;
D O I
10.1145/3573051.3593396
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiple-choice questions, which are widely used in educational assessments, have the potential to negatively impact student learning and skew analytics when they contain item-writing flaws. Existing methods for evaluating multiple-choice questions in educational contexts tend to focus primarily on machine readability metrics, such as grammar, syntax, and formatting, without considering the intended use of the questions within course materials and their pedagogical implications. In this study, we present the results of crowdsourcing the evaluation of multiple-choice questions based on 15 common item-writing flaws. Through analysis of 80 crowdsourced evaluations on questions from the domains of calculus and chemistry, we found that crowdworkers were able to accurately evaluate the questions, matching 75% of the expert evaluations across multiple questions. They were able to correctly distinguish between two levels of Bloom's Taxonomy for the calculus questions, but were less accurate for chemistry questions. We discuss how to scale this question evaluation process and the implications it has across other domains. This work demonstrates how crowdworkers can be leveraged in the quality evaluation of educational questions, regardless of prior experience or domain knowledge.
引用
收藏
页码:25 / 34
页数:10
相关论文
empty
未找到相关数据