Optimization methods for the imputation of missing values in Educational Institutions Data

被引:0
|
作者
Aureli, D. [1 ]
Bruni, R. [2 ]
Daraio, C. [2 ]
机构
[1] Sapienza Univ Rome, Dept Informat Engn Elect & Telecommun, Rome, Italy
[2] Sapienza Univ Rome, Dept Comp Control & Management Engn, Rome, Italy
基金
欧盟地平线“2020”;
关键词
Information Reconstruction; Data imputation; Machine learning; Interconnected data; Educational Institutions;
D O I
10.1016/j.mex.2020.101208
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called "trend smoothing imputation", is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called "donor imputation", is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression. Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria. The values imputed by our techniques should respect the trend, the size and the ratios of each Institution. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Imputation methods for missing data in educational diagnostic evaluation
    Fernandez-Alonso, Ruben
    Suarez-Alvarez, Javier
    Muniz, Jose
    PSICOTHEMA, 2012, 24 (01) : 167 - 175
  • [2] Methods for imputation of missing values in air quality data sets
    Junninen, H
    Niska, H
    Tuppurainen, K
    Ruuskanen, J
    Kolehmainen, M
    ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) : 2895 - 2907
  • [3] Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions®
    Bruni, Renato
    Daraio, Cinzia
    Aureli, Davide
    KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [4] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [6] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [7] Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets
    Schumann, Yannis
    Gocke, Antonia
    Neumann, Julia E.
    PROTEOMICS, 2025, 25 (1-2)
  • [8] From Predictive Methods to Missing Data Imputation: An Optimization Approach
    Bertsimas, Dimitris
    Pawlowski, Colin
    Zhuo, Ying Daisy
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
  • [9] Imputation of missing values for compositional data using classical and robust methods
    Hron, K.
    Templ, M.
    Filzmoser, P.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (12) : 3095 - 3107
  • [10] Imputation methods to deal with missing values when data mining trauma injury data
    Penny, Kay I.
    Chesney, Thomas
    ITI 2006: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2006, : 213 - +