Use me wisely: AI-driven assessment for LLM prompting skills development

被引：0

作者：

Ognibene, Dimitri ^{[1
,5
]}

Donabauer, Gregor ^{[2
]}

Theophilou, Emily ^{[3
]}

Koyuturk, Cansu ^{[1
]}

Yavari, Mona ^{[1
]}

Bursic, Sathya ^{[1
]}

Telari, Alessia ^{[1
]}

Testa, Alessia ^{[1
]}

Boiano, Raffaele ^{[4
]}

Taibi, Davide ^{[5
]}

Hernandez-Leo, Davinia ^{[3
]}

Kruschwitz, Udo ^{[2
]}

Ruskov, Martin ^{[6
]}

机构：

[1] Univ Milano Bicocca, Milan, Italy

[2] Univ Regensburg, Regensburg, Germany

[3] Univ Pompeu Fabra, Barcelona, Spain

[4] Politecn Milan, Milan, Italy

[5] Natl Res Council Italy CNR, Rome, Italy

[6] Univ Milan, Milan, Italy

来源：

EDUCATIONAL TECHNOLOGY & SOCIETY | 2025年 / 28卷 / 03期

关键词：

Artificial intelligence in education; Computational thinking; Natural language processing; Data science applications in education;

D O I：

10.30191/ETS.202507_28(3).SP12

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Prompting with large language model (LLM) powered chatbots, such as ChatGPT, is adopted in a variety of tasks and processes across different domains. Given the intrinsic complexity of LLMs, effective prompting is not as straightforward as anticipated which highlights the need for novel educational and support methods that are both widely accessible and seamlessly integrated into task workflows. However, LLM prompting shows strong dependence on the specific task and domain, reducing the usefulness of generic methods. We intend to investigate if LLM-based methods can support learning assessments using ad-hoc guidelines and an extremely limited number of annotated prompt samples. In our framework, guidelines are transformed into features to be detected in the learners' prompts. The descriptions of these features, together with annotated sample prompts, are used to create few-shot learning detectors. We compare various configurations of these few-shot detectors testing 3 state-of-the-art LLMs and derived ensemble models. Our experiments are performed using cross-validation on original sample prompts and a specifically collected test set of prompts from task-naive learners. We find a strong impact of the LLMs on our feature list. One of the most recent models, GPT-4, shows promising performance on most of the features. However, some closely connected models (GPT-3, GPT-3.5 Turbo (Instruct)) show different behaviors when classifying features. We highlight the need for further research in light of the possible impact of design choices on the selection of features and detection prompts. Our findings are of relevance for researchers and practitioners in generative AI literacy, as well as researchers in computer-supported learning assessment.

引用

页码：184 / 201

页数：18

共 67 条

[1] Few-shot training LLMs for project-specific code-summarization [J].

Ahmed, Toufique ;

Devanbu, Premkumar .

PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,

[2]

Akin F. K., 2023, GitHub

[3] The emotional impact of generative AI: negative emotions and perception of threat [J].

Alessandro, Gabbiadini ;

Dimitri, Ognibene ;

Cristina, Baldissarri ;

Anna, Manfredi .

BEHAVIOUR & INFORMATION TECHNOLOGY, 2025, 44 (04) :676-693

[4] Generative AI Literacy: Twelve Defining Competencies [J].

Annapureddy, Ravinithesh ;

Fornaroli, Alessandro ;

Gatica-Perez, Daniel .

Digital Government: Research and Practice, 2025, 6 (01)

[5] An Exploratory Study on How Non-Determinism in Large Language Models Affects Log Parsing [J].

Astekin, Merve ;

Hort, Max ;

Moonen, Leon .

PROCEEDINGS 2024 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON INTERPRETABILITY, ROBUSTNESS, AND BENCHMARKING IN NEURAL SOFTWARE ENGINEERING, INTENSE 2024, 2024, :13-18

[6]

Bobula M., 2024, J LEARNING DEV HIGHE, V30, DOI [10.47408/jldhe.vi30.1137, DOI 10.47408/JLDHE.VI30.1137]

[7]

Brown TB, 2020, ADV NEUR IN, V33

[8]

Chen B., 2023, Computers and Education: Artificial Intelligence, V5, P100184, DOI [10.1016/j.caeai.2023.100184, DOI 10.1016/J.CAEAI.2023.100184]

[9] Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [J].

Chen, Yuyan ;

Fu, Qiang ;

Yuan, Yichen ;

Wen, Zhihao ;

Fan, Ge ;

Liu, Dayiheng ;

Zhang, Dongmei ;

Li, Zhixu ;

Xiao, Yanghua .

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, :245-255

[10] Me and the Machines: Possibilities and Pitfalls of Using Artificial Intelligence for Qualitative Data Analysis [J].

Chubb, Laura Ann .

INTERNATIONAL JOURNAL OF QUALITATIVE METHODS, 2023, 22

← 1 2 3 4 5 6 7 →