Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

被引:1
作者
Zhang, Liang [1 ]
Lin, Jionghao [2 ,3 ,4 ]
Sabatini, John [1 ]
Borchers, Conrad [3 ]
Weitekamp, Daniel [3 ]
Cao, Meng [3 ]
Hollander, John [5 ]
Hu, Xiangen [6 ]
Graesser, Arthur C. [1 ]
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
[2] Univ Hong Kong, Fac Educ, Hong Kong, Peoples R China
[3] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA
[4] Monash Univ, Fac Informat Technol, Ctr Learning Analyt, Clayton, Vic 3800, Australia
[5] Arkansas State Univ, Jonesboro, AR 72401 USA
[6] Hong Kong Polytech Univ, Dept Appl Social Sci, Hong Kong, Peoples R China
来源
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES | 2025年 / 18卷
关键词
Data augmentation; data sparsity; generative artificial intelligence (GenAI); intelligent tutoring system (ITS); learning performance data; FRAMEWORK;
D O I
10.1109/TLT.2025.3526582
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Learning performance data, such as correct or incorrect answers and problem-solving attempts in intelligent tutoring systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80%similar to 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-D tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing (KT) tasks that predict missing performance values based on real observations. Third, data augmentation using generative artificial intelligence models, including generative adversarial network (GAN), specifically vanilla GANs and generative pretrained transformers (GPTs, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for adult reading comprehension. We found that tensor factorization outperformed baseline KT techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and the vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o-based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics.
引用
收藏
页码:145 / 164
页数:20
相关论文
共 50 条
  • [41] Spectral Data Augmentation Using Deep Generative Model for Remote Chemical Sensing
    Son, Jungjae
    Byun, Hyung Joon
    Park, Munyeol
    Ha, Jeongjae
    Nam, Hyunwoo
    IEEE ACCESS, 2024, 12 : 98326 - 98337
  • [42] Training data augmentation using generative models with statistical guarantees for materials informatics
    Hiroshi Ohno
    Soft Computing, 2022, 26 : 1181 - 1196
  • [43] Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition
    Wang, Shuai
    Yang, Yexin
    Wu, Zhanghao
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2598 - 2609
  • [44] Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition
    Sheng, Peiyao
    Yang, Zhuolin
    Hu, Hu
    Tan, Tian
    Qian, Yanmin
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 121 - 125
  • [45] Data augmentation using generative adversarial networks for images and biomarkers in medicine and neuroscience
    Yahaya, Maizan Syamimi Meor
    Teo, Jason
    FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2023, 9
  • [46] Selective Data Augmentation for Improving the Performance of Offline Reinforcement Learning
    Han, Jungwoo
    Kim, Jinwhan
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 222 - 226
  • [47] Experimental Assessment of the Performance of Data Augmentation with Generative Adversarial Networks in the Image Classification Problem
    Karadag, Ozge Oztimur
    Cicek, Ozlem Erdas
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 48 - 51
  • [48] Meta generative image and text data augmentation optimization
    Zhang, Enzhi
    Dong, Bochen
    Wahib, Mohamed
    Zhong, Rui
    Munetomo, Masaharu
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (09) : 12644 - 12662
  • [49] Data augmentation for intelligent manufacturing with generative adversarial framework
    Wang, Yanxia
    Li, Kang
    Gan, Shaojun
    Cameron, Che
    Zheng, Min
    2019 1ST INTERNATIONAL CONFERENCE ON INDUSTRIAL ARTIFICIAL INTELLIGENCE (IAI 2019), 2019,
  • [50] Conditional Generative Data Augmentation for Clinical Audio Datasets
    Seibold, Matthias
    Hoch, Armando
    Farshad, Mazda
    Navab, Nassir
    Fuernstahl, Philipp
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 345 - 354