Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions®

被引:12
作者
Bruni, Renato [1 ]
Daraio, Cinzia [1 ]
Aureli, Davide [2 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, Rome, Italy
[2] Sapienza Univ Rome, Dept Informat Engn Elect & Telecommun, Rome, Italy
基金
欧盟地平线“2020”;
关键词
Data imputation; Information reconstruction; Machine learning; Educational Institutions; VALUES; INFORMATION; ALGORITHM;
D O I
10.1016/j.knosys.2020.106512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Educational Institutions data constitute the basis for several important analyses on the educational systems; however they often contain not negligible shares of missing values, for several reasons. We consider in this work the relevant case of the European Tertiary Education Register (ETER), describing the Educational Institutions of Europe. The presence of missing values prevents the full exploitation of this database, since several types of analyses that could be performed are currently impracticable. The imputation of artificial data, reconstructed with the aim of being statistically equivalent to the (unknown) missing data, would allow to overcome these problems. A main complication in the imputation of this type of data is given by the correlations that exist among all the variables. We propose several imputation techniques designed to deal with the different types of missing values appearing in these interconnected data. We use these techniques to impute the database. Moreover, we evaluate the accuracy of the proposed approach by artificially introducing missing data, by imputing them, and by comparing imputed and original values. Results show that the information reconstruction does not introduce statistically significant changes in the data and that the imputed values are close enough to the original values. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 37 条
[1]   A bagging algorithm for the imputation of missing values in time series [J].
Andiojaya, Agung ;
Demirhan, Haydar .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 129 :10-26
[2]  
[Anonymous], 1994, TIME SERIES ANAL
[3]  
Aureli D., METHODSX
[4]   Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm [J].
Bashir, Faraj ;
Wei, Hua-Liang .
NEUROCOMPUTING, 2018, 276 :23-30
[5]  
Bertsimas D, 2018, J MACH LEARN RES, V18
[6]  
Bianchi G., 2019, P 17 INT C SCIENT IN, P2094
[7]   A novel imputation methodology for time series based on pattern sequence forecasting [J].
Bokde, Neeraj ;
Beck, Marcus W. ;
Martinez Alvarez, Francisco ;
Kulat, Kishore .
PATTERN RECOGNITION LETTERS, 2018, 116 :88-96
[8]  
Bonaccorsi A, 2014, KNOWLEDGE, DIVERSITY AND PERFORMANCE IN EUROPEAN HIGHER EDUCATION: A CHANGING LANDSCAPE, P1
[9]   Error correction for massive datasets [J].
Bruni, R .
OPTIMIZATION METHODS & SOFTWARE, 2005, 20 (2-3) :291-310
[10]  
Bruni R., DATA IN BRIEF