Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] Exploring Inter-Sensor Correlation for Missing Data Estimation
    Li, Liying
    Liu, Yang
    Wei, Tongquan
    Li, Xin
    IECON 2020: THE 46TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2020, : 2108 - 2114
  • [32] Estimating and Identifying Unspecified Correlation Structure for Longitudinal Data
    Hu, Jianhua
    Wang, Peng
    Qu, Annie
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (02) : 455 - 476
  • [33] A NOVEL HYBRID APPROACH TO ESTIMATING MISSING VALUES IN DATABASES USING K-NEAREST NEIGHBORS AND NEURAL NETWORKS
    Aydilek, Ibrahim Berkan
    Arslan, Ahmet
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (7A): : 4705 - 4717
  • [34] Effects of Missing Data Methods in SEM Under Conditions of Incomplete and Nonnormal Data
    Li, Jian
    Lomax, Richard G.
    JOURNAL OF EXPERIMENTAL EDUCATION, 2017, 85 (02) : 231 - 258
  • [35] Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data
    Saini, Preeti
    Nagpal, Bharti
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40393 - 40414
  • [36] An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data
    Ding, Yufeng
    Simonoff, Jeffrey S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 131 - 170
  • [37] From Predictive Methods to Missing Data Imputation: An Optimization Approach
    Bertsimas, Dimitris
    Pawlowski, Colin
    Zhuo, Ying Daisy
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
  • [38] Comparison of missing value imputation methods for crop yield data
    Lokupitiya, Ravindra S.
    Lokupitiya, Erandathie
    Paustian, Keith
    ENVIRONMETRICS, 2006, 17 (04) : 339 - 349
  • [39] Missing data imputation for fuzzy rule-based classification systems
    Luengo, Julian
    Saez, Jose A.
    Herrera, Francisco
    SOFT COMPUTING, 2012, 16 (05) : 863 - 881
  • [40] Handling Missing Data in Instrumental Variable Methods for Causal Inference
    Kennedy, Edward H.
    Mauro, Jacqueline A.
    Daniels, Michael J.
    Burns, Natalie
    Small, Dylan S.
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 6, 2019, 6 : 125 - 148