Optimization methods for the imputation of missing values in Educational Institutions Data

被引:0
作者
Aureli, D. [1 ]
Bruni, R. [2 ]
Daraio, C. [2 ]
机构
[1] Sapienza Univ Rome, Dept Informat Engn Elect & Telecommun, Rome, Italy
[2] Sapienza Univ Rome, Dept Comp Control & Management Engn, Rome, Italy
基金
欧盟地平线“2020”;
关键词
Information Reconstruction; Data imputation; Machine learning; Interconnected data; Educational Institutions;
D O I
10.1016/j.mex.2020.101208
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called "trend smoothing imputation", is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called "donor imputation", is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression. Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria. The values imputed by our techniques should respect the trend, the size and the ratios of each Institution. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页数:5
相关论文
共 6 条
[1]  
Bonaccorsi A., 2007, Universities and strategic knowledge creation: Specialization and performance in Europe
[2]   Error correction for massive datasets [J].
Bruni, R .
OPTIMIZATION METHODS & SOFTWARE, 2005, 20 (2-3) :291-310
[3]  
Bruni R., 2001, Advances in Intelligent Data Analysis. 4th International Conference, IDA 2001. Proceedings (Lecture Notes in Computer Science Vol.2189), P84
[4]  
Bruni R., 2020, IMPUTATION TECHNIQUE
[5]   Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions® [J].
Bruni, Renato ;
Daraio, Cinzia ;
Aureli, Davide .
KNOWLEDGE-BASED SYSTEMS, 2021, 212
[6]   Information reconstruction in educational institutions data from the European tertiary education registry [J].
Bruni, Renato ;
Daraio, Cinzia ;
Aureli, Davide .
DATA IN BRIEF, 2021, 34