Ensuring privacy through synthetic data generation in education

被引:1
作者
Liu, Qinyi [1 ]
Shakya, Ronas [1 ]
Jovanovic, Jelena [2 ,3 ]
Khalil, Mohammad [1 ]
de la Hoz-Ruiz, Javier [4 ]
机构
[1] Univ Bergen, Ctr Sci Learning & Technol, Bergen, Norway
[2] Univ Belgrade, Fac Org Sci, Belgrade, Serbia
[3] Univ Bergen, Ctr Sci Learning & Technol SLATE, Bergen, Norway
[4] Univ Granada, Fac Educ Sci, Granada, Spain
关键词
artificial intelligence for education; educational data mining; privacy; synthetic data;
D O I
10.1111/bjet.13576
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
High-volume, high-quality and diverse datasets are crucial for advancing research in the education field. However, such datasets often contain sensitive information that poses significant privacy challenges. Traditional anonymisation techniques fail to meet the privacy standards required by regulations like GDPR, prompting the need for more robust solutions. Synthetic data have emerged as a promising privacy-preserving approach, allowing for the generation and sharing of datasets that mimic real data while ensuring privacy. Still, the application of synthetic data alone on educational datasets remains vulnerable to privacy threats such as linkage attacks. Therefore, this study explores for the first time the application of private synthetic data, which combines synthetic data with differential privacy mechanisms, in the education sector. By considering the dual needs of data utility and privacy, we investigate the performance of various synthetic data generation techniques in safeguarding sensitive educational information. Our research focuses on two key questions: the capability of these techniques to prevent privacy threats and their impact on the utility of synthetic educational datasets. Through this investigation, we aim to bridge the gap in understanding the balance between privacy and utility of advanced privacy-preserving techniques within educational contexts.
引用
收藏
页码:1053 / 1073
页数:21
相关论文
共 50 条
[1]   Open Learning Analytics: A Systematic Review of Benchmark Studies using Open University Learning Analytics Dataset (OULAD) [J].
Alhakbani, Haya A. ;
Alnassar, Fatema M. .
PROCEEDINGS OF 2022 7TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2022, 2022, :81-86
[2]  
Allen, 2020, ARXIV
[3]  
Ankan A., 2015, P 14 PYTHON SCI C SC, P6, DOI DOI 10.25080/MAJORA-7B98-3ED-001
[4]   Privacy and Utility of Private Synthetic Data for Medical Data Analyses [J].
Appenzeller, Arno ;
Leitner, Moritz ;
Philipp, Patrick ;
Krempel, Erik ;
Beyerer, Juergen .
APPLIED SCIENCES-BASEL, 2022, 12 (23)
[5]  
Bautista P., 2021, PROTECTING STUDENT P, P66, DOI [10.1007/978303078270211, DOI 10.1007/978-3-030-78270-2_11]
[6]  
Cong Y., 2020, ARXIV, DOI DOI 10.48550/ARXIV.2006.07543
[7]   A Multi-Dimensional Evaluation of Synthetic Data Generators [J].
Dankar, Fida K. ;
Ibrahim, Mahmoud K. ;
Ismail, Leila .
IEEE ACCESS, 2022, 10 :11147-11158
[8]  
Dwork C, 2006, LECT NOTES COMPUT SC, V4052, P1
[9]  
El Emam K., 2020, Practical synthetic data generation
[10]   Survey on Synthetic Data Generation, Evaluation Methods and GANs [J].
Figueira, Alvaro ;
Vaz, Bruno .
MATHEMATICS, 2022, 10 (15)