Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering

被引:50
|
作者
Tian, Jing [1 ]
Yu, Bing [1 ]
Yu, Dan [1 ]
Ma, Shilong [1 ,2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing data; Multiple imputation; Gray System Theory; Entropy; Clustering; ESTIMATING NULL VALUES; FUZZY C-MEANS; INFORMATION;
D O I
10.1007/s10489-013-0469-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Researchers and practitioners who use databases usually feel that it is cumbersome in knowledge discovery or application development due to the issue of missing data. Though some approaches can work with a certain rate of incomplete data, a large portion of them demands high data quality with completeness. Therefore, a great number of strategies have been designed to process missingness particularly in the way of imputation. Single imputation methods initially succeeded in predicting the missing values for specific types of distributions. Yet, the multiple imputation algorithms have maintained prevalent because of the further promotion of validity by minimizing the bias iteratively and less requirement on prior knowledge to the distributions. This article carefully reviews the state of the art and proposes a hybrid missing data completion method named Multiple Imputation using Gray-system-theory and Entropy based on Clustering (MIGEC). Firstly, the non-missing data instances are separated into several clusters. Then, the imputed value is obtained after multiple calculations by utilizing the information entropy of the proximal category for each incomplete instance in terms of the similarity metric based on Gray System Theory (GST). Experimental results on University of California Irvine (UCI) datasets illustrate the superiority of MIGEC to other current achievements on accuracy for either numeric or categorical attributes under different missing mechanisms. Further discussion on real aerospace datasets states MIGEC is also applicable for the specific area with both more precise inference and faster convergence than other multiple imputation methods in general.
引用
收藏
页码:376 / 388
页数:13
相关论文
共 50 条
  • [1] Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering
    Jing Tian
    Bing Yu
    Dan Yu
    Shilong Ma
    Applied Intelligence, 2014, 40 : 376 - 388
  • [2] Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field
    Tian, Jing
    Yu, Bing
    Yu, Dan
    Ma, Shilong
    SCIENTIFIC WORLD JOURNAL, 2013,
  • [3] Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation
    Dubey, Aditya
    Rasool, Akhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 710 - 714
  • [4] A new iterative fuzzy clustering algorithm for multiple imputation of missing data
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [5] Analyses using multiple imputation need to consider missing data in auxiliary variables
    Madley-Dowd, Paul
    Curnow, Elinor
    Hughes, Rachael A.
    Cornish, Rosie P.
    Tilling, Kate
    Heron, Jon
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2025,
  • [6] Partial distance evidential clustering for missing data with multiple imputation
    Tian, Hong-Peng
    Zhang, Zhen
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [7] A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data
    Zhang, Li
    Bing, Zhaohong
    Zhang, Liyong
    PATTERN ANALYSIS AND APPLICATIONS, 2015, 18 (02) : 377 - 384
  • [8] A Noise-Aware Multiple Imputation Algorithm for Missing Data
    Li, Fangfang
    Sun, Hui
    Gu, Yu
    Yu, Ge
    MATHEMATICS, 2023, 11 (01)
  • [9] Accounting for missing data in statistical analyses: multiple imputation is not always the answer
    Hughes, Rachael A.
    Heron, Jon
    Sterne, Jonathan A. C.
    Tilling, Kate
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2019, 48 (04) : 1294 - 1304
  • [10] Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures
    Faucheux, Lilith
    Resche-Rigon, Matthieu
    Curis, Emmanuel
    Soumelis, Vassili
    Chevret, Sylvie
    BIOMETRICAL JOURNAL, 2021, 63 (02) : 372 - 393