AI-assisted automated scoring of picture-cued writing tasks for language assessment

被引:9
作者
Zhao, Ruibin [1 ,2 ]
Zhuang, Yipeng [1 ]
Zou, Di [3 ]
Xie, Qin [4 ]
Yu, Philip L. H. [1 ]
机构
[1] Educ Univ Hong Kong, Dept Math & Informat Technol, 10 Lo Ping Rd, Hong Kong, Peoples R China
[2] Chuzhou Univ, Sch Comp Sci & Informat Engn, Chuzhou, Peoples R China
[3] Educ Univ Hong Kong, Dept English Language Educ, 10 Lo Ping Rd, Hong Kong, Peoples R China
[4] Educ Univ Hong, Dept Linguist & Modern Languages, 10 Lo Ping Rd, Hong Kong, Peoples R China
关键词
Automated writing assessment; Picture-cued writing; Cross-modal matching; Artificial intelligence; FEEDBACK; PERFORMANCE;
D O I
10.1007/s10639-022-11473-y
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Grading assignments is inherently subjective and time-consuming; automatic scoring tools can greatly reduce teacher workload and shorten the time needed for providing feedback to learners. The purpose of this paper is to propose a novel method for automatically scoring student responses to picture-cued writing tasks. As a popular paradigm for language instruction and assessment, a picture-cued writing task typically requires students to describe a picture or pictures. Correspondingly, the automatic scoring methods must measure the link(s) between visual pictures and their textual descriptions. For this purpose, we first designed a picture-cued writing test and collected nearly 4 k responses from 279 K12 students. Based on these responses, we then developed an AI scoring model by incorporating the emerging cross-modal matching technology and some NLP algorithms. The performance of the model was evaluated carefully with six popular measures and was found to demonstrate accurate scoring results with a small mean absolute error of 0.479 and a high adjacent-agreement rate of 90.64%. We believe this method could reduce the subjective elements inherent in human grading and save teachers' time from the mundane task of grading to other valuable endeavors such as designing teaching plans based on AI-generated diagnosis of student progress.
引用
收藏
页码:7031 / 7063
页数:33
相关论文
共 60 条
  • [1] Aschawir A., 2014, SCHOLARLY J ED, V3, P88
  • [2] Asrifan A., 2015, INT J LANGUAGE LINGU, V3, P244, DOI [10.11648/j.ijll.20150304.18, DOI 10.11648/J.IJLL.20150304.18]
  • [3] Attali Yigal, 2006, The Journal of Technology, Learning and Assessment, V4
  • [4] Baird C, 2017, AUST J INDIG EDUC, V46, P160, DOI 10.1017/jie.2016.32
  • [5] Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country
    Bridgeman, Brent
    Trapani, Catherine
    Attali, Yigal
    [J]. APPLIED MEASUREMENT IN EDUCATION, 2012, 25 (01) : 27 - 40
  • [6] Validity arguments for diagnostic assessment using automated writing evaluation
    Chapelle, Carol A.
    Cotos, Elena
    Lee, Jooyoung
    [J]. LANGUAGE TESTING, 2015, 32 (03) : 385 - 405
  • [7] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [8] Chen F., 2022, ARXIV, DOI DOI 10.48550/ARXIV.2202.09061
  • [9] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
    Chen, Hui
    Ding, Guiguang
    Liu, Xudong
    Lin, Zijia
    Liu, Ji
    Han, Jungong
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794