Towards assessing the credibility of chatbot responses for technical assessments in higher education

被引：0

作者：

Murali, Ritwik ^{[1
]}

Dhanalakshmy, Dhanya M. ^{[1
]}

Avudaiappan, Veeramanohar ^{[2
]}

Sivakumar, Gayathri ^{[1
]}

机构：

[1] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amrita Sch Comp, Coimbatore, India

[2] Amrita Vishwa Vidyapeetham, Dept Elect & Elect Engn, Amrita Sch Comp, Coimbatore, India

来源：

2024 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE, EDUCON 2024 | 2024年

关键词：

AI Chat-bots; Large Language Models (LLMs); Education; Generative AI;

D O I：

10.1109/EDUCON60312.2024.10578934

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The recent challenge in higher education is to convey the importance of understanding concepts over rote learning. This challenge has increased in complexity with the arrival of large language model (LLM) based chatbots. Students are increasingly looking to such AI based chatbots as "sources of wisdom" instead of utilizing the same as learning aids. Despite disclaimers by the LLM creators, many students turn to the chatbot for answers to almost all learning assignments. This research work explores the level to which the LLM responses can be utilized for student learning in technical education. By understanding the contradictions between student answers and the responses generated by the LLMs, this work explores the limitations of the LLM based environments towards providing acceptable answers for assessments - specifically within the computer science engineering domain. While numerous studies have concentrated on ChatGPT, it is essential to consider the diverse range of alternative chat-bots accessible online that students may also utilize. Therefore, this work considers 5 popular AI-based chatbots for the study. With the "prompt" being the prime factor that impacts the response from chat-bots, the responses of the chatbots were collected using 2 different prompting techniques. The chatbot responses were evaluated against actual student responses by multiple reviewers to gauge its effectiveness as appropriate student answers. Both students and all chatbots were given questions aligned with the Blooms taxonomy levels (BTL) 1 to 4 in three different subjects. Each of the courses included a diverse range of questions including text-based questions, mathematical problems, and programming questions. The results show that the chatbot responses were acceptable for low BT level questions but failed to answer convincingly when asked for an algorithm. Overall, the chatbot performance (across the tested LLMs) was below average when the question set covered the BTL range 1-4. However, since the answers up to BTL2 were acceptable, LLM based chatbot answers were able to barely pass 1-2 of the 3 subjects (with the best performers scoring near the pass mark). Based on these results, it is possible to conclude that LLM based chatbots cannot be depended on for higher order learning but can be used to aid students who are struggling to pass basic courses.

引用

页数：9

共 49 条

[1] Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2] AlAfnan M. A., 2023, Journal of Artificial Intelligence and Technology
[3] [Anonymous], 2023, Openassistant conversations-democratizing large language model alignment
[4] Evaluating the Performance of ChatGPT in Ophthalmology
Antaki, Fares
Touma, Samir
Milad, Daniel
El -Khoury, Jonathan
Duval, Renaud
[J]. OPHTHALMOLOGY SCIENCE, 2023, 3 (04):
[5] A Survey of Current Machine Learning Approaches to Student Free-Text Evaluation for Intelligent Tutoring
Bai, Xiaoyu
Stede, Manfred
[J]. INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2023, 33 (04) : 992 - 1030
[6] Bang Y, 2023, Arxiv, DOI arXiv:2302.04023
[7] Borji A, 2023, Arxiv, DOI [arXiv:2302.03494, DOI 10.48550/ARXIV.2302.03494, 10.48550/arxiv.2302.03494]
[8] Brown TB, 2020, ADV NEUR IN, V33
[9] Chen LJ, 2023, Arxiv, DOI arXiv:2307.09009
[10] Choi JH, 2022, J LEGAL EDUC, V71, P387

← 1 2 3 4 5 →