Towards Automated Multiple Choice Question Generation and Evaluation: Aligning with Bloom's Taxonomy

被引：1

作者：

Hwang, Kevin ^{[1
]}

Wang, Kenneth ^{[1
]}

Alomair, Maryam ^{[2
]}

Choa, Fow-Sen ^{[2
]}

Chen, Lujie Karen ^{[2
]}

机构：

[1] Glenelg High Sch, Glenelg, MD 21737 USA

[2] Univ Maryland Baltimore Cty, Baltimore, MD 21250 USA

来源：

ARTIFICIAL INTELLIGENCE IN EDUCATION, PT II, AIED 2024 | 2024年 / 14830卷

关键词：

automated question generation; GPT-4; Bloom's taxonomy; large language models; multiple choice question generation; ITEM WRITING FLAWS;

D O I：

10.1007/978-3-031-64299-9_35

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiple Choice Questions (MCQs) are frequently used for educational assessments for their efficiency in grading and providing feedback. However, manually generatingMCQs has some limitations and challenges. This study explores an AI-driven approach to creating and evaluating Bloom's Taxonomy-aligned college-level biology MCQs using a varied number of shots in few-shot prompting with GPT-4. Shots, or examples of correct prompt-response pairs, were sourced from previously published datasets containing educator-approved MCQs labeled with their Bloom's taxonomy and were matched to prompts via a maximal marginal relevance search. To obtain ground truths to compare GPT-4 against, three expert human evaluators with a minimum of 4 years of educational experience annotated a random sample of the generated questions with regards to relevance to the input prompt, classroom usability, and perceived Bloom's Taxonomy level. Furthermore, we explored the feasibility of an AI-driven evaluation approach that can rate question usability using the Item Writing Flaws (IWFs) framework. We conclude that GPT-4 generally shows promise in generating relevant and usable questions. However, more work needs to be done to improve Bloom-level alignment accuracy (accuracy of alignment between GPT-4's target level and the actual level of the generated question). Moreover, we note that a general inverse relationship exists between alignment accuracy and number of shots. On the other hand, no clear trend between shot number and relevance/usability was observed. These findings shed light on automated question generation and assessment, presenting the potential for advancements in AI-driven educational evaluation methods.

引用

页码：389 / 396

页数：8

共 35 条

[31] Work in progress - Using Bloom's taxonomy as a format for self-evaluation of design education activities II
Abe, Tokio
Starr, Patrick J.
36TH ANNUAL FRONTIERS IN EDUCATION, CONFERENCE PROGRAM, VOLS 1-4: BORDERS: INTERNATIONAL, SOCIAL AND CULTURAL, 2006, : 273 - +
[32] Bloom's IoT Taxonomy towards an effective Industry 4.0 education: Case study on Open-source IoT laboratory
Awouda, Ahmed
Traini, Emiliano
Asranov, Mansur
Chiabert, Paolo
EDUCATION AND INFORMATION TECHNOLOGIES, 2024, 29 (12) : 15043 - 15065
[33] Difficulty-Controllable Multiple-Choice Question Generation for Reading Comprehension Using Item Response Theory
Tomikawa, Yuto
Uto, Masaki
ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : 312 - 320
[34] The Blooming Anatomy Tool (BAT): A discipline-specific rubric for utilizing Bloom's taxonomy in the design and evaluation of assessments in the anatomical sciences
Thompson, Andrew R.
O'Loughlin, Valerie D.
ANATOMICAL SCIENCES EDUCATION, 2015, 8 (06) : 493 - 501
[35] An empirical analysis of online multiple-choice question-generation learning activity for the enhancement of students' cognitive strategy development while learning science
Yu, F. Y.
Hung, C. C.
RECENT PROGRESS IN COMPUTATIONAL SCIENCES AND ENGINEERING, VOLS 7A AND 7B, 2006, 7A-B : 585 - +

← 1 2 3 4 →