Differentiable and Scalable Generative Adversarial Models for Data Imputation

被引：4

作者：

Wu, Yangyang ^{[1
]}

Wang, Jun ^{[2
]}

Miao, Xiaoye ^{[1
]}

Wang, Wenjia ^{[2
]}

Yin, Jianwei ^{[3
]}

机构：

[1] Zhejiang Univ, Ctr Data Sci, Hangzhou 310058, Peoples R China

[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China

[3] Zhejiang Univ, Coll Comp Sci, Ctr Data Sci, Hangzhou 310058, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 02期

关键词：

Data imputation; generative adversarial network; large-scale incomplete data; EFFICIENT;

D O I：

10.1109/TKDE.2023.3293129

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data imputation has been extensively explored to solve the missing data problem. The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named SCIS to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data.SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Moreover, SCIS can also accelerate the autoencoder based imputation models. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 6.23x. Using around 1.27% samples, SCIS yields competitive accuracy with the state-of-the-art imputation methods in much shorter computation time.

引用

页码：490 / 503

页数：14

共 54 条

[1] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
ALTMAN, NS
[J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
[2] Arjovsky M., 2017, arXiv, DOI DOI 10.48550/ARXIV.1701.04862
[3] Arjovsky M, 2017, PR MACH LEARN RES, V70
[4] Bellemare MG, 2017, Arxiv, DOI arXiv:1705.10743
[5] Biessmann F, 2019, J MACH LEARN RES, V20
[6] "Deep" Learning for Missing Value Imputation in Tables with Non-Numerical Data
Biessmann, Felix
Salinas, David
Schelter, Sebastian
Schmidt, Philipp
Lange, Dustin
[J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 2017 - 2025
[7] Boris M., 2020, P INT C MACH LEARN, P1
[8] Ipsen NB, 2021, Arxiv, DOI [arXiv:2006.12871, 10.48550/arXiv.2006.12871, DOI 10.48550/ARXIV.2006.12871]
[9] Cao W, 2018, ADV NEUR IN, V31
[10] XGBoost: A Scalable Tree Boosting System
Chen, Tianqi
Guestrin, Carlos
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794

← 1 2 3 4 5 6 →