Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [41] Addressing the Curse of Missing Data in Clinical Contexts: A Novel Approach to Correlation-based Imputation
    Curioso, Isabel
    Santos, Ricardo
    Ribeiro, Bruno
    Carreiro, Andre
    Coelho, Pedro
    Fragata, Jose
    Gamboa, Hugo
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (06)
  • [42] The correlation-assisted missing data estimator
    Cannings, Timothy, I
    Fan, Yingying
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 49
  • [43] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
    Bhushan, Shashi
    Pandey, Abhay Pratap
    COMMUNICATIONS IN MATHEMATICS AND STATISTICS, 2023, 11 (02) : 325 - 340
  • [44] New Chain Imputation Methods for Estimating Population Mean in the Presence of Missing Data Using Two Auxiliary Variables
    Shashi Bhushan
    Abhay Pratap Pandey
    Communications in Mathematics and Statistics, 2023, 11 : 325 - 340
  • [45] Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model
    Sefidian, Amir Masoud
    Daneshpour, Negin
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 68 - 94
  • [46] The Effects of Model Based Missing Data Methods on Guessing Parameter in Case of Ignorable Missing Data
    Kocak, Duygu
    PEGEM EGITIM VE OGRETIM DERGISI, 2018, 8 (01): : 155 - 171
  • [47] Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms
    Ding, Zengyu
    Mei, Gang
    Cuomo, Salvatore
    Li, Yixuan
    Xu, Nengxiong
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2020, 48 (03) : 534 - 548
  • [49] ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Hamdan, Hazlina
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 470 - 476
  • [50] Matrix and Tensor Based Methods for Missing Data Estimation in Large Traffic Networks
    Asif, Muhammad Tayyab
    Mitrovic, Nikola
    Dauwels, Justin
    Jaillet, Patrick
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2016, 17 (07) : 1816 - 1825