Fine-Tuning ChatGPT for Automatic Scoring of Written Scientific Explanations in Chinese

被引：0

作者：

Yang, Jie ^{[1
,4
]}

Latif, Ehsan ^{[2
,3
]}

He, Yuze ^{[4
]}

Zhai, Xiaoming ^{[2
,3
]}

机构：

[1] Beijing Normal Univ, Sch Phys & Astron, Beijing 100875, Peoples R China

[2] Univ Georgia, AI4STEM Educ Ctr, Athens, GA 30602 USA

[3] Univ Georgia, Dept Math Sci & Social Studies Educ, Athens, GA 30602 USA

[4] Beijing Normal Univ, Res Inst Sci Educ, Beijing 100875, Peoples R China

来源：

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY | 2025年

关键词：

ChatGPT; Fine-tuning; Automatic scoring; Scientific explanations;

D O I：

10.1007/s10956-025-10199-z

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

The development of explanations for scientific phenomena is crucial in science assessment. However, the scoring of students' written explanations is a challenging and resource-intensive process. Large language models (LLMs) have demonstrated the potential to address these challenges, particularly when the explanations are written in English, an alphabetic language. It remains unknown whether this approach can be applied to other logographic languages. This study thus explores the potential of fine-tuning ChatGPT, one advanced LLM, to automatically score scientific explanations written in Chinese. We collected and automatically scored student responses to seven scientific explanation tasks in Chinese, and examined the relationship between scoring accuracy and reasoning complexity with Kendall correlation. Finally, a qualitative analysis was conducted to explore how linguistic features influence scoring accuracy. The results indicate that through domain-specific adaptation, the fine-tuned ChatGPT can accurately score students' written explanations in Chinese. However, scoring accuracy correlates with reasoning complexity, showing a negative correlation for lower-level responses and a positive one for higher-level responses. The model tends to overrate complex reasoning for low-level responses with complex sentence structures and underrate high-level responses, using generalizing, summarizing, or simple causal reasoning. These opposing correlations are associated with different linguistic features. The comprehensiveness of student responses is often in tension with the simplicity and clarity of language structure in terms of scoring accuracy. For lower-level responses, simplicity and clarity are prioritized, leading to more accurate scores for simpler and shorter responses. For higher-level responses, comprehensiveness is prioritized, resulting in more accurate scores for long and information-rich responses. These findings demonstrate the effectiveness of LLMs in automatic scoring within a Chinese context and highlight the importance of considering linguistic features and reasoning complexity in developing and fine-tuning automatic scoring models for educational assessments.

引用

页数：18

共 17 条

[1] Fine-tuning ChatGPT for automatic scoring
Latif E.
Zhai X.
Computers and Education: Artificial Intelligence, 2024, 6
[2] Criticism of the Non-Theistic Explanations of Fine-Tuning
Doko, Enis
BEYTULHIKME-AN INTERNATIONAL JOURNAL OF PHILOSOPHY, 2019, 9 (02): : 299 - 317
[3] Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models
Ballout, Mohamad
Krumnack, Ulf
Heidemann, Gunther
Kuehnberger, Kai-Uwe
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
[4] Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning
Rafael Luz Araújo
Flávio H. D. de Araújo
Romuere R. V. e Silva
Multimedia Systems, 2022, 28 : 1239 - 1250
[5] Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning
Araujo, Rafael Luz
de Araujo, Flavio H. D.
e Silva, Romuere R., V
MULTIMEDIA SYSTEMS, 2022, 28 (04) : 1239 - 1250
[6] Chinese Medical Named Entity Recognition based on Expert Knowledge and Fine-tuning Bert
Zhang, Bofeng
Yao, Xiuhong
Li, Haiyan
Aini, Mirensha
2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 84 - 90
[7] Exploring the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot learning, and fine-tuning
Kim, Minjin
Lu, Xiaofei
JOURNAL OF ENGLISH FOR ACADEMIC PURPOSES, 2024, 71
[8] A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning
Caserta, Marco
Reiners, Torsten
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2016, 248 (02) : 593 - 606
[9] AdaFT: An efficient domain-adaptive fine-tuning framework for sentiment analysis in chinese financial texts
Guofeng Yan
Kuashuai Peng
Yongfeng Wang
Hengliang Tan
Jiao Du
Heng Wu
Applied Intelligence, 2025, 55 (10)
[10] A soft computing automatic based in deep learning with use of fine-tuning for pulmonary segmentation in computed tomography images
Xu, Yongzhao
Souza, Luis F. F.
Silva, Iagson C. L.
Marques, Adriell G.
Silva, Francisco H. S.
Nunes, Virginia X.
Han, Tao
Jia, Chuanyu
de Albuquerque, Victor Hugo C.
Filho, Pedro P. Reboucas
APPLIED SOFT COMPUTING, 2021, 112

← 1 2 →