FUSAIN: Combining Functional Dependencies and Clustering for Missing Values Imputation

被引：0

作者：

Wu, Huaiguang ^{[1
]}

Li, Shuaichao ^{[2
]}

Shi, Wenjun ^{[1
]}

Du, Shaoqing ^{[2
]}

机构：

[1] Zhengzhou Univ Light Ind, Fac Comp & Commun Engn, Zhengzhou 450066, Henan, Peoples R China

[2] Zhengzhou Univ Light Ind, Zhengzhou 450066, Henan, Peoples R China

来源：

ENGINEERING LETTERS | 2022年 / 30卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Missing value imputation; Affinity propagation clustering; Functional dependencies; K nearest neighbor; MULTIPLE IMPUTATION; REGRESSION; ALGORITHM; SYSTEMS;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Missing data is a common problem faced with real-world datasets. A large number of missing data will greatly affect the quality of the data and cause deviations in the results of data analysis. Therefore, missing values imputation (MVI) is a critical data processing process. Most imputation methods model the distribution of observed data to approximate the missing values. Such an approach usually models a single distribution for the entire dataset, which ignores the dependencies between data. In this paper, we propose a novel hybrid imputation algorithm, called combining Functional dependencies and clUstering for miSsing vAlues ImputatioN (FUSAIN), which combines Functional Dependencies (FDs), K Nearest Neighbor (KNN), and Affinity Propagation (AP) clustering. This proposed algorithm not only considers the distribution of data but also uses the data dependency relationship represented by FDs to impute missing values. From the experimental results, the imputation performance of the proposed algorithm achieves superior performance compared to common and popular imputation algorithms.

引用

页码：513 / 521

页数：9

共 43 条

[1] Abdella M, 2005, COMPUT INFORM, V24, P577
[2] Imputation of missing data with class imbalance using conditional generative adversarial networks
Awan, Saqib Ejaz
Bennamoun, Mohammed
Sohel, Ferdous
Sanfilippo, Frank
Dwivedi, Girish
[J]. NEUROCOMPUTING, 2021, 453 : 164 - 171
[3] A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm
Aydilek, Ibrahim Berkan
Arslan, Ahmet
[J]. INFORMATION SCIENCES, 2013, 233 : 25 - 35
[4] Multiple Imputation for Missing Data via Sequential Regression Trees
Burgette, Lane F.
Reiter, Jerome P.
[J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) : 1070 - 1076
[5] EFFICIENT AND ADAPTIVE LINEAR REGRESSION IN SEMI-SUPERVISED SETTINGS
Chakrabortty, Abhishek
Cai, Tianxi
[J]. ANNALS OF STATISTICS, 2018, 46 (04) : 1541 - 1572
[6] Codd E.F., 1971, IBM Research Report
[7] Dardzinska A, 2005, LECT NOTES ARTIF INT, V3430, P255
[8] A novel framework for imputation of missing values in databases
Farhangfar, Alireza
Kurgan, Lukasz A.
Pedrycz, Witold
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (05): : 692 - 709
[9] Feng HH, 2005, LECT NOTES ARTIF INT, V3683, P581
[10] Flach PA, 1999, AI COMMUN, V12, P139

← 1 2 3 4 5 →