FUSAIN: Combining Functional Dependencies and Clustering for Missing Values Imputation

被引:0
作者
Wu, Huaiguang [1 ]
Li, Shuaichao [2 ]
Shi, Wenjun [1 ]
Du, Shaoqing [2 ]
机构
[1] Zhengzhou Univ Light Ind, Fac Comp & Commun Engn, Zhengzhou 450066, Henan, Peoples R China
[2] Zhengzhou Univ Light Ind, Zhengzhou 450066, Henan, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing value imputation; Affinity propagation clustering; Functional dependencies; K nearest neighbor; MULTIPLE IMPUTATION; REGRESSION; ALGORITHM; SYSTEMS;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Missing data is a common problem faced with real-world datasets. A large number of missing data will greatly affect the quality of the data and cause deviations in the results of data analysis. Therefore, missing values imputation (MVI) is a critical data processing process. Most imputation methods model the distribution of observed data to approximate the missing values. Such an approach usually models a single distribution for the entire dataset, which ignores the dependencies between data. In this paper, we propose a novel hybrid imputation algorithm, called combining Functional dependencies and clUstering for miSsing vAlues ImputatioN (FUSAIN), which combines Functional Dependencies (FDs), K Nearest Neighbor (KNN), and Affinity Propagation (AP) clustering. This proposed algorithm not only considers the distribution of data but also uses the data dependency relationship represented by FDs to impute missing values. From the experimental results, the imputation performance of the proposed algorithm achieves superior performance compared to common and popular imputation algorithms.
引用
收藏
页码:513 / 521
页数:9
相关论文
共 43 条
  • [1] Abdella M, 2005, COMPUT INFORM, V24, P577
  • [2] Imputation of missing data with class imbalance using conditional generative adversarial networks
    Awan, Saqib Ejaz
    Bennamoun, Mohammed
    Sohel, Ferdous
    Sanfilippo, Frank
    Dwivedi, Girish
    [J]. NEUROCOMPUTING, 2021, 453 : 164 - 171
  • [3] A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm
    Aydilek, Ibrahim Berkan
    Arslan, Ahmet
    [J]. INFORMATION SCIENCES, 2013, 233 : 25 - 35
  • [4] Multiple Imputation for Missing Data via Sequential Regression Trees
    Burgette, Lane F.
    Reiter, Jerome P.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) : 1070 - 1076
  • [5] EFFICIENT AND ADAPTIVE LINEAR REGRESSION IN SEMI-SUPERVISED SETTINGS
    Chakrabortty, Abhishek
    Cai, Tianxi
    [J]. ANNALS OF STATISTICS, 2018, 46 (04) : 1541 - 1572
  • [6] Codd E.F., 1971, IBM Research Report
  • [7] Dardzinska A, 2005, LECT NOTES ARTIF INT, V3430, P255
  • [8] A novel framework for imputation of missing values in databases
    Farhangfar, Alireza
    Kurgan, Lukasz A.
    Pedrycz, Witold
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05): : 692 - 709
  • [9] Feng HH, 2005, LECT NOTES ARTIF INT, V3683, P581
  • [10] Flach PA, 1999, AI COMMUN, V12, P139