Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

被引:11
|
作者
Moore, Steven [1 ]
Nguyen, Huy A. [1 ]
Chen, Tianying [1 ]
Stamper, John [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
RESPONSIVE AND SUSTAINABLE EDUCATIONAL FUTURES, EC-TEL 2023 | 2023年 / 14200卷
关键词
Question evaluation; Question quality; Rule-based; GPT-4; ITEM WRITING FLAWS;
D O I
10.1007/978-3-031-42682-7_16
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existingmethods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.
引用
收藏
页码:229 / 245
页数:17
相关论文
共 20 条
  • [1] GPT-4's performance in supporting physician decision-making in nephrology multiple-choice questions
    Ryunosuke Noda
    Kenichiro Tanabe
    Daisuke Ichikawa
    Yugo Shibagaki
    Scientific Reports, 15 (1)
  • [2] GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education
    Ch'en, Peter Y.
    Day, Wesley
    Pekson, Ryan C.
    Barrientos, Juan
    Burton, William B.
    Ludwig, Allison B.
    Jariwala, Sunit P.
    Cassese, Todd
    BMC MEDICAL EDUCATION, 2025, 25 (01)
  • [3] Assessing the quality of automatic-generated short answers using GPT-4
    Rodrigues L.
    Dwan Pereira F.
    Cabral L.
    Gašević D.
    Ramalho G.
    Ferreira Mello R.
    Computers and Education: Artificial Intelligence, 2024, 7
  • [4] Assessing the Quality of Multiple-Choice Test Items
    Clifton, Sandra L.
    Schriner, Cheryl L.
    NURSE EDUCATOR, 2010, 35 (01) : 12 - 16
  • [5] Assessing GPT-4's accuracy in answering clinical pharmacological questions on pain therapy
    Stroop, Anna
    Stroop, Tabea
    Alsofy, Samer Zawy
    Wegner, Moritz
    Nakamura, Makoto
    Stroop, Ralf
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2025,
  • [6] Evaluating the quality of multiple-choice questions in a NAPLEX preparation book
    Danh, Tina
    Desiderio, Tamara
    Herrmann, Victoria
    Lyons, Heather M.
    Patrick, Frankie
    Wantuch, Gwendolyn A.
    Dell, Kamila A.
    CURRENTS IN PHARMACY TEACHING AND LEARNING, 2020, 12 (10) : 1188 - 1193
  • [7] Uncovering the Effects of Genes, Proteins, and Medications on Functions of Wound Healing: A Dependency Rule-Based Text Mining Approach Leveraging GPT-4 based Evaluation
    Jui, Jayati H.
    Hauskrecht, Milos
    2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
  • [8] Assessing the Quality of Student-Generated Short Answer Questions Using GPT-3
    Moore, Steven
    Nguyen, Huy A.
    Bier, Norman
    Domadia, Tanvi
    Stamper, John
    EDUCATING FOR A NEW FUTURE: MAKING SENSE OF TECHNOLOGY-ENHANCED LEARNING ADOPTION, EC-TEL 2022, 2022, 13450 : 243 - 257
  • [9] Using the Retrieval-Augmented Generation Technique to Improve the Performance of GPT-4 in Answering Quran Questions
    Alnefaie, Sarah
    Atwell, Eric
    Alsalka, Mohammed Ammar
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 377 - 381
  • [10] Crowdsourcing the Evaluation of Multiple-Choice Questions Using Item-Writing Flaws and Bloom's Taxonomy
    Moore, Steven
    Fang, Ellen
    Nguyen, Huy A.
    Stamper, John
    PROCEEDINGS OF THE TENTH ACM CONFERENCE ON LEARNING @ SCALE, L@S 2023, 2023, : 25 - 34