Fine-Tuning ChatGPT for Automatic Scoring of Written Scientific Explanations in Chinese

被引:0
作者
Yang, Jie [1 ,4 ]
Latif, Ehsan [2 ,3 ]
He, Yuze [4 ]
Zhai, Xiaoming [2 ,3 ]
机构
[1] Beijing Normal Univ, Sch Phys & Astron, Beijing 100875, Peoples R China
[2] Univ Georgia, AI4STEM Educ Ctr, Athens, GA 30602 USA
[3] Univ Georgia, Dept Math Sci & Social Studies Educ, Athens, GA 30602 USA
[4] Beijing Normal Univ, Res Inst Sci Educ, Beijing 100875, Peoples R China
关键词
ChatGPT; Fine-tuning; Automatic scoring; Scientific explanations;
D O I
10.1007/s10956-025-10199-z
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The development of explanations for scientific phenomena is crucial in science assessment. However, the scoring of students' written explanations is a challenging and resource-intensive process. Large language models (LLMs) have demonstrated the potential to address these challenges, particularly when the explanations are written in English, an alphabetic language. It remains unknown whether this approach can be applied to other logographic languages. This study thus explores the potential of fine-tuning ChatGPT, one advanced LLM, to automatically score scientific explanations written in Chinese. We collected and automatically scored student responses to seven scientific explanation tasks in Chinese, and examined the relationship between scoring accuracy and reasoning complexity with Kendall correlation. Finally, a qualitative analysis was conducted to explore how linguistic features influence scoring accuracy. The results indicate that through domain-specific adaptation, the fine-tuned ChatGPT can accurately score students' written explanations in Chinese. However, scoring accuracy correlates with reasoning complexity, showing a negative correlation for lower-level responses and a positive one for higher-level responses. The model tends to overrate complex reasoning for low-level responses with complex sentence structures and underrate high-level responses, using generalizing, summarizing, or simple causal reasoning. These opposing correlations are associated with different linguistic features. The comprehensiveness of student responses is often in tension with the simplicity and clarity of language structure in terms of scoring accuracy. For lower-level responses, simplicity and clarity are prioritized, leading to more accurate scores for simpler and shorter responses. For higher-level responses, comprehensiveness is prioritized, resulting in more accurate scores for long and information-rich responses. These findings demonstrate the effectiveness of LLMs in automatic scoring within a Chinese context and highlight the importance of considering linguistic features and reasoning complexity in developing and fine-tuning automatic scoring models for educational assessments.
引用
收藏
页数:18
相关论文
共 17 条
  • [1] Fine-tuning ChatGPT for automatic scoring
    Latif E.
    Zhai X.
    Computers and Education: Artificial Intelligence, 2024, 6
  • [2] Criticism of the Non-Theistic Explanations of Fine-Tuning
    Doko, Enis
    BEYTULHIKME-AN INTERNATIONAL JOURNAL OF PHILOSOPHY, 2019, 9 (02): : 299 - 317
  • [3] Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models
    Ballout, Mohamad
    Krumnack, Ulf
    Heidemann, Gunther
    Kuehnberger, Kai-Uwe
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [4] Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning
    Rafael Luz Araújo
    Flávio H. D. de Araújo
    Romuere R. V. e Silva
    Multimedia Systems, 2022, 28 : 1239 - 1250
  • [5] Automatic segmentation of melanoma skin cancer using transfer learning and fine-tuning
    Araujo, Rafael Luz
    de Araujo, Flavio H. D.
    e Silva, Romuere R., V
    MULTIMEDIA SYSTEMS, 2022, 28 (04) : 1239 - 1250
  • [6] Chinese Medical Named Entity Recognition based on Expert Knowledge and Fine-tuning Bert
    Zhang, Bofeng
    Yao, Xiuhong
    Li, Haiyan
    Aini, Mirensha
    2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 84 - 90
  • [7] Exploring the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot learning, and fine-tuning
    Kim, Minjin
    Lu, Xiaofei
    JOURNAL OF ENGLISH FOR ACADEMIC PURPOSES, 2024, 71
  • [8] A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning
    Caserta, Marco
    Reiners, Torsten
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2016, 248 (02) : 593 - 606
  • [9] AdaFT: An efficient domain-adaptive fine-tuning framework for sentiment analysis in chinese financial texts
    Guofeng Yan
    Kuashuai Peng
    Yongfeng Wang
    Hengliang Tan
    Jiao Du
    Heng Wu
    Applied Intelligence, 2025, 55 (10)
  • [10] A soft computing automatic based in deep learning with use of fine-tuning for pulmonary segmentation in computed tomography images
    Xu, Yongzhao
    Souza, Luis F. F.
    Silva, Iagson C. L.
    Marques, Adriell G.
    Silva, Francisco H. S.
    Nunes, Virginia X.
    Han, Tao
    Jia, Chuanyu
    de Albuquerque, Victor Hugo C.
    Filho, Pedro P. Reboucas
    APPLIED SOFT COMPUTING, 2021, 112