Comparison of generative AI performance on undergraduate and postgraduate written assessments in the biomedical sciences

被引:4
作者
Williams, Andrew [1 ]
机构
[1] UCL, Dept Educ, Div Med, London WC1E 6BT, England
来源
INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION | 2024年 / 21卷 / 01期
关键词
Assessment; Artificial intelligence; Higher education; Academic writing; ChatGPT; Essay; Biomedical science; Medicine;
D O I
10.1186/s41239-024-00485-y
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The value of generative AI tools in higher education has received considerable attention. Although there are many proponents of its value as a learning tool, many are concerned with the issues regarding academic integrity and its use by students to compose written assessments. This study evaluates and compares the output of three commonly used generative AI tools, ChatGPT, Bing and Bard. Each AI tool was prompted with an essay question from undergraduate (UG) level 4 (year 1), level 5 (year 2), level 6 (year 3) and postgraduate (PG) level 7 biomedical sciences courses. Anonymised AI generated output was then evaluated by four independent markers, according to specified marking criteria and matched to the Frameworks for Higher Education Qualifications (FHEQ) of UK level descriptors. Percentage scores and ordinal grades were given for each marking criteria across AI generated papers, inter-rater reliability was calculated using Kendall's coefficient of concordance and generative AI performance ranked. Across all UG and PG levels, ChatGPT performed better than Bing or Bard in areas of scientific accuracy, scientific detail and context. All AI tools performed consistently well at PG level compared to UG level, although only ChatGPT consistently met levels of high attainment at all UG levels. ChatGPT and Bing did not provide adequate references, while Bing falsified references. In conclusion, generative AI tools are useful for providing scientific information consistent with the academic standards required of students in written assignments. These findings have broad implications for the design, implementation and grading of written assessments in higher education.
引用
收藏
页数:22
相关论文
共 23 条
  • [11] Gilson A, 2022, medRxiv
  • [12] Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
    Ibrahim, Hazem
    Liu, Fengyuan
    Asim, Rohail
    Battu, Balaraju
    Benabderrahmane, Sidahmed
    Alhafni, Bashar
    Adnan, Wifag
    Alhanai, Tuka
    AlShebli, Bedoor
    Baghdadi, Riyadh
    Belanger, Jocelyn J.
    Beretta, Elena
    Celik, Kemal
    Chaqfeh, Moumena
    Daqaq, Mohammed F.
    El Bernoussi, Zaynab
    Fougnie, Daryl
    de Soto, Borja Garcia
    Gandolfi, Alberto
    Gyorgy, Andras
    Habash, Nizar
    Harris, J. Andrew
    Kaufman, Aaron
    Kirousis, Lefteris
    Kocak, Korhan
    Lee, Kangsan
    Lee, Seungah S.
    Malik, Samreen
    Maniatakos, Michail
    Melcher, David
    Mourad, Azzam
    Park, Minsu
    Rasras, Mahmoud
    Reuben, Alicja
    Zantout, Dania
    Gleason, Nancy W.
    Makovi, Kinga
    Rahwan, Talal
    Zaki, Yasir
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [13] ChatGPT for good? On opportunities and challenges of large language models for education
    Kasneci, Enkelejda
    Sessler, Kathrin
    Kuechemann, Stefan
    Bannert, Maria
    Dementieva, Daryna
    Fischer, Frank
    Gasser, Urs
    Groh, Georg
    Guennemann, Stephan
    Huellermeier, Eyke
    Krusche, Stepha
    Kutyniok, Gitta
    Michaeli, Tilman
    Nerdel, Claudia
    Pfeffer, Juergen
    Poquet, Oleksandra
    Sailer, Michael
    Schmidt, Albrecht
    Seidel, Tina
    Stadler, Matthias
    Weller, Jochen
    Kuhn, Jochen
    Kasneci, Gjergji
    [J]. LEARNING AND INDIVIDUAL DIFFERENCES, 2023, 103
  • [14] Larsen B., 2023, Generative AI: A Game-changer That Society and Industry Need to Be Ready For2023
  • [15] McGhee P., 2003, The academic quality handbook : assuring and enhancing learning in higher education, DOI [10.4324/9780203416761, DOI 10.4324/9780203416761]
  • [16] Application of Artificial Intelligence powered digital writing assistant in higher education: randomized controlled trial
    Nazari, Nabi
    Shabbir, Muhammad Salman
    Setiawan, Roy
    [J]. HELIYON, 2021, 7 (05)
  • [17] Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond
    Perkins, Mike
    [J]. JOURNAL OF UNIVERSITY TEACHING AND LEARNING PRACTICE, 2023, 20 (02)
  • [18] A systematic review on critical thinking intervention studies in higher education across professional fields
    Puig, Blanca
    Blanco-Anaya, Paloma
    Bargiela, Ines M.
    Crujeiras-Perez, Beatriz
    [J]. STUDIES IN HIGHER EDUCATION, 2019, 44 (05) : 860 - 869
  • [19] Rudolph J., 2023, J APPL LEARNING TEAC, V6, P342, DOI [10.37074/jalt.2023.6.1.9, DOI 10.37074/JALT.2023.6.1.9]
  • [20] Suaverdez J B., 2023, Global Journal of Business and Integral Security