Over-Reasoning and Redundant Calculation of Large Language Models

被引：0

作者：

Chiang, Cheng-Han ^{[1
]}

Lee, Hung-yi ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei, Taiwan

来源：

PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) can solve problems step-by-step. While this chain-of-thought (CoT) reasoning boosts LLMs' performance, it is unclear if LLMs know when to use CoT and whether those CoT are always necessary to answer the question. This paper shows that LLMs tend to generate redundant calculations and reasoning on a manually constructed math QA dataset, GSM8K-Zero. GSM8K-Zero is constructed such that the questions can be answered without any calculations, but LLMs, including Llama-2 models and Claude-2, tend to generate lengthy and unnecessary calculations to answer the questions. We also conduct experiments to explain why LLMs generate redundant calculations and reasonings. GSM8K-Zero is publicly available at https://github.com/d223302/Over-Reasoning-of- LLMs and https://huggingface.co/datasets/dcml0714/GSM8K-Zero.

引用

页码：161 / 169

页数：9

共 19 条

[1] Anand Yuvanesh, 2023, Gpt4all: Training an assistant -style chatbot with large scale data distillation from gpt-3.5-turbo
[2] Anil R., 2023, PALM 2 TECHNICAL REP
[3] Anthropic, 2023, Model card and evaluations for Claude models
[4] Brown TB, 2020, ADV NEUR IN, V33
[5] Chiang Wei-Lin, 2023, Vicuna: An opensource chatbot impressing gpt-4 with 90%* chatgpt quality
[6] Cobbe K, 2021, Arxiv, DOI arXiv:2110.14168
[7] Golovneva O., 2023, 11 INT C LEARN REPR
[8] Guo BY, 2023, Arxiv, DOI [arXiv:2301.07597, DOI 10.48550/ARXIV.2301.07597, 10.48550/arXiv.2301.07597]
[9] Kojima T, 2022, Arxiv, DOI [arXiv:2205.11916, DOI 10.48550/ARXIV.2205.11916]
[10] Longpre S, 2023, Arxiv, DOI arXiv:2301.13688

← 1 2 →