Code Soliloquies for Accurate Calculations in Large Language Models

被引：5

作者：

Sonkar, Shashank ^{[1
]}

Chen, Xinghe ^{[1
]}

Le, MyCo ^{[1
]}

Liu, Naiming ^{[1
]}

Mallick, Debshila Basu ^{[2
]}

Baraniuk, Richard G. ^{[1
]}

机构：

[1] Rice Univ, Houston, TX 77005 USA

[2] Rice Univ, OpenStax, Houston, TX USA

来源：

FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024 | 2024年

关键词：

TUTOR;

D O I：

10.1145/3636555.3636889

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents impressive language processing capabilities, its limitations in fundamental mathematical reasoning curtail its efficacy for such subjects. To tackle this limitation, we introduce in this paper an innovative stateful prompt design. Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4. Each student response triggers an internal monologue, or ' code soliloquy ' in the GPT-tutorbot, which assesses whether its subsequent response would necessitate calculations. If a calculation is deemed necessary, it scripts the relevant Python code and uses the Python output to construct a response to the student. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive. The preliminary Subject Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA model, effectively uses Python for computations, which significantly enhances the accuracy and computational reliability of Higgs ' responses.

引用

页码：828 / 835

页数：8

共 32 条

[1]

Bubeck S, 2023, Arxiv, DOI arXiv:2303.12712

[2]

Chen WH, 2023, Arxiv, DOI arXiv:2211.12588

[3]

Chiang Wei-Lin, 2023, Vicuna: An Open -Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

[4]

Chowdhery A, 2022, Arxiv, DOI [arXiv:2204.02311, DOI 10.48550/ARXIV.2204.02311, 10.48550/arXiv.2204.02311]

[5]

Cobbe K, 2021, Arxiv, DOI [arXiv:2110.14168, DOI 10.48550/ARXIV.2110.14168]

[6]

Dettmers T, 2023, Arxiv, DOI [arXiv:2305.14314, DOI 10.48550/ARXIV.2305.14314]

[7]

Dziri N, 2023, Arxiv, DOI [arXiv:2305.18654, DOI 10.48550/ARXIV.2305.18654, 10.48550/arXiv.2305.18654]

[8]

Feng Shi, 2021, 2021 IEEE FRONTIERS, P1

[9]

Gao Luyu, P MACHINE LEARNING R

[10] AutoTutor: A tutor with dialogue in natural language [J].

Graesser, AC ;

Lu, SL ;

Jackson, GT ;

Mitchell, HH ;

Ventura, M ;

Olney, A ;

Louwerse, MM .

BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2004, 36 (02) :180-192

← 1 2 3 4 →