Estimating missing data using novel correlation maximization based methods

被引:16
|
作者
Sefidian, Amir Masoud [1 ]
Daneshpour, Negin [1 ]
机构
[1] Shahid Rajaee Teacher Training Univ, Fac Comp Engn, Tehran, Iran
关键词
Missing values; Imputation; Correlation; Regression; FUZZY C-MEANS; K-NEAREST NEIGHBORS; VALUE IMPUTATION; GENETIC ALGORITHM; VALUES; CLASSIFICATION; REGRESSION; FRAMEWORK; SELECTION; PATTERNS;
D O I
10.1016/j.asoc.2020.106249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate estimation of missing data plays a vital role in ensuring a high level of data quality. The missing values should be imputed before performing data mining, machine learning, and other data processing tasks. Ten correlation-based imputation methods are proposed in this paper. All of these methods try to maximize the correlation between a missing feature and other features. The maximization is achieved by selecting segments of data that have strong correlations. The proposed approach involves the following main steps to impute each missing instance. First, a base set is selected from complete instances. Second, data segments with strong correlations are generated using the base set and the rest of the complete instances. Finally, each missing value is imputed by applying linear models to the discovered segments of data. This study considers seven real datasets from different fields with different missing rates. The imputation quality of the proposed methods is compared to those of seven other imputation approaches in terms of three well-known evaluation criteria. The experimental results reveal that the proposed approach has better imputation performance than competing imputation techniques in most cases. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 50 条
  • [21] Imputations of missing values using a tracking-removed autoencoder trained with incomplete data
    Lai, Xiaochen
    Wu, Xia
    Zhang, Liyong
    Lu, Wei
    Zhong, Chongquan
    NEUROCOMPUTING, 2019, 366 (54-65) : 54 - 65
  • [22] An Unsupervised Data-Mining and Generative-Based Multiple Missing Data Imputation Network for Energy Dataset
    Kim, Hyung Joon
    Kim, Mun Kyeom
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (11) : 13429 - 13440
  • [23] Novel genetic-based negative correlation learning for estimating soil temperature
    Kazemi, S. M. R.
    Bidgoli, Behrouz Minaei
    Shamshirband, Shahaboddin
    Karimi, Seyed Mehdi
    Ghorbani, Mohammad Ali
    Chau, Kwok-wing
    Pour, Reza Kazem
    ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS, 2018, 12 (01) : 506 - 516
  • [24] Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
    Mills, Harriet L.
    Heron, Jon
    Relton, Caroline
    Suderman, Matt
    Tilling, Kate
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2019, 188 (11) : 2021 - 2030
  • [25] NOVEL ENSEMBLE TECHNIQUES FOR REGRESSION WITH MISSING DATA
    Hassan, Mostafa M.
    Atiya, Amir F.
    El Gayar, Neamat
    El-Fouly, Raafat
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2009, 5 (03) : 635 - 652
  • [26] New Imputation Method for Estimating Population Mean in the Presence of Missing Data
    Lawson, Nuanpan
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (09) : 3740 - 3748
  • [27] Feature selection with missing data using mutual information estimators
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2012, 90 : 3 - 11
  • [28] Missing Data Prediction using Correlation Genetic Algorithm and SVM Approach
    Alhroob, Aysh
    Alzyadat, Wael
    Almukahel, Ikhlas
    Altarawneh, Hassan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 703 - 709
  • [29] A Traffic Flow Data Quality Repair Model Based on Spatiotemporal Correlation
    Li, Yan
    Xu, Liangjie
    Qin, Wendie
    Xie, Cong
    Ji, Chuanwang
    IEEE ACCESS, 2024, 12 : 116816 - 116828
  • [30] A novel imputation method for missing values in air pollutant time series data
    Pena, Mario
    Ortega, Patricia
    Orellana, Marcos
    2019 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2019, : 99 - 104