Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

被引：1

作者：

Zhang, Liang ^{[1
]}

Lin, Jionghao ^{[2
,3
,4
]}

Sabatini, John ^{[1
]}

Borchers, Conrad ^{[3
]}

Weitekamp, Daniel ^{[3
]}

Cao, Meng ^{[3
]}

Hollander, John ^{[5
]}

Hu, Xiangen ^{[6
]}

Graesser, Arthur C. ^{[1
]}

机构：

[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA

[2] Univ Hong Kong, Fac Educ, Hong Kong, Peoples R China

[3] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA

[4] Monash Univ, Fac Informat Technol, Ctr Learning Analyt, Clayton, Vic 3800, Australia

[5] Arkansas State Univ, Jonesboro, AR 72401 USA

[6] Hong Kong Polytech Univ, Dept Appl Social Sci, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES | 2025年 / 18卷

关键词：

Data augmentation; data sparsity; generative artificial intelligence (GenAI); intelligent tutoring system (ITS); learning performance data; FRAMEWORK;

D O I：

10.1109/TLT.2025.3526582

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Learning performance data, such as correct or incorrect answers and problem-solving attempts in intelligent tutoring systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80%similar to 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-D tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing (KT) tasks that predict missing performance values based on real observations. Third, data augmentation using generative artificial intelligence models, including generative adversarial network (GAN), specifically vanilla GANs and generative pretrained transformers (GPTs, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for adult reading comprehension. We found that tensor factorization outperformed baseline KT techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and the vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o-based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics.

引用

页码：145 / 164

页数：20

共 50 条

[31] Data augmentation using generative adversarial networks for robust speech recognition
Qian, Yanmin
Hu, Hu
Tan, Tian
SPEECH COMMUNICATION, 2019, 114 : 1 - 9
[32] Data Augmentation Powered by Generative Adversarial Networks
Poka, Karoly Bence
Szemenyei, Marton
2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
[33] Generative Data Augmentation applied to Face Recognition
Jabberi, Marwa
Wali, Ali
Alimi, Adel M.
2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 242 - 247
[34] Data Augmentation Using Generative Adversarial Network for Environmental Sound Classification
Madhu, Aswathy
Kumaraswamy, Suresh
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[35] Generative Adversarial Networks for Bitcoin Data Augmentation
Zola, Francesco
Lukas Bruse, Jan
Etxeberria Barrio, Xabier
Galar, Mikel
Orduna Urrutia, Raul
2020 2ND CONFERENCE ON BLOCKCHAIN RESEARCH & APPLICATIONS FOR INNOVATIVE NETWORKS AND SERVICES (BRAINS), 2020, : 136 - 143
[36] Generative Adversarial Networks as an Advanced Data Augmentation Technique for MRI Data
Konidaris, Filippos
Tagaris, Thanos
Sdraka, Maria
Stafylopatis, Andreas
PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 48 - 59
[37] RecSmart: Data Augmentation to Facilitate Recommendation Using Skewed and Sparse Data of Restaurant Loyalty Programs
Chakraborty, Ishani
PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2018, VOL 2, 2019, 881 : 1002 - 1011
[38] Improving Indoor Localization Through Data Augmentation of Visualized Multidimensional Fingerprints via Enhanced Generative Networks
Yang, Haoxiao
Chen, Liang
IEEE SENSORS JOURNAL, 2024, 24 (24) : 42549 - 42560
[39] Application of generative AI-based data augmentation technique in transformer winding deformation fault diagnosis
Chen, Yu
Zhao, Zhongyong
Liu, Jiangnan
Tan, Shan
Liu, Changqing
ENGINEERING FAILURE ANALYSIS, 2024, 159
[40] Training data augmentation using generative models with statistical guarantees for materials informatics
Ohno, Hiroshi
SOFT COMPUTING, 2022, 26 (03) : 1181 - 1196

← 1 2 3 4 5 →