Unsupervised data imputation with multiple importance sampling variational autoencoders

被引：0

作者：

Kuang, Shenfen ^{[1
]}

Huang, Yewen ^{[2
]}

Song, Jie ^{[1
]}

机构：

[1] Shaoguan Univ, Sch Math & Stat, Shaoguan 512005, Peoples R China

[2] Guangdong Polytech Normal Univ, Sch Elect & Informat, Guangzhou 510665, Peoples R China

来源：

SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期

关键词：

Missing data; Variational autoencoders; Multiple importance sampling; Resampling; MISSING DATA IMPUTATION;

D O I：

10.1038/s41598-025-87641-0

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Recently, deep latent variable models have made significant progress in dealing with missing data problems, benefiting from their ability to capture intricate and non-linear relationships within the data. In this work, we further investigate the potential of Variational Autoencoders (VAEs) in addressing the uncertainty associated with missing data via a multiple importance sampling strategy. We propose a Missing data Multiple Importance Sampling Variational Auto-Encoder (MMISVAE) method to effectively model incomplete data. Our approach consists of a learning step and an imputation step. During the learning step, the mixture components are represented by multiple separate encoder networks, which are later combined through simple averaging to enhance the latent representation capabilities of the VAEs when dealing with incomplete data. The statistical model and variational distributions are iteratively updated by maximizing the Multiple Importance Sampling Evidence Lower Bound (MISELBO) on the joint log-likelihood. In the imputation step, missing data is estimated using conditional expectation through multiple importance resampling. We propose an efficient imputation algorithm that broadens the scope of Missing data Importance Weighted Auto-Encoder (MIWAE) by incorporating multiple proposal probability distributions and the resampling schema. One notable characteristic of our method is the complete unsupervised nature of both the learning and imputation processes. Through comprehensive experimental analysis, we present evidence of the effectiveness of our method in improving the imputation accuracy of incomplete data when compared to current state-of-the-art VAEs-based methods.

引用

页数：16

共 42 条

[1] Variational Inference: A Review for Statisticians
Blei, David M.
Kucukelbir, Alp
McAuliffe, Jon D.
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
[2] Boquet G, 2019, INT CONF ACOUST SPEE, P2882, DOI [10.1109/icassp.2019.8683011, 10.1109/ICASSP.2019.8683011]
[3] Adaptive Importance Sampling The past, the present, and the future
Bugallo, Monica F.
Elvira, Victor
Martino, Luca
Luengo, David
Miguez, Joaquin
Djuric, Petar M.
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (04) : 60 - 79
[4] Burda Y., 2015, INT C LEARN REPR
[5] Deep Generative Imputation Model for Missing Not At Random Data
Chen, Jialei
Xu, Yuanbo
Wang, Pengyang
Yang, Yongjian
[J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 316 - 325
[6] Adaptive Multiple Importance Sampling
Cornuet, Jean-Marie
Marin, Jean-Michel
Mira, Antonietta
Robert, Christian P.
[J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2012, 39 (04) : 798 - 812
[7] Diffusion Models in Vision: A Survey
Croitoru, Florinel-Alin
Hondru, Vlad
Ionescu, Radu Tudor
Shah, Mubarak
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10850 - 10869
[8] Medical Image Imputation From Image Collections
Dalca, Adrian V.
Bouman, Katherine L.
Freeman, William T.
Rost, Natalia S.
Sabuncu, Mert R.
Golland, Polina
Weiner, Michael W.
Aisen, Paul
Weiner, Michael
Aisen, Paul
Petersen, Ronald
Jack, Clifford R., Jr.
Jagust, William
Trojanowki, John Q.
Toga, Arthur W.
Beckett, Laurel
Green, Robert C.
Saykin, Andrew J.
Morris, John
Shaw, Leslie M.
Khachaturian, Zaven
Sorensen, Greg
Carrillo, Maria
Kuller, Lew
Raichle, Marc
Paul, Steven
Davies, Peter
Fillit, Howard
Hefti, Franz
Holtzman, David
Mesulam, M. Marcel
Potter, William
Snyder, Peter
Lilly, Eli
Logovinsky, Veronika
Green, Robert C.
Montine, Tom
Petersen, Ronald
Aisen, Paul
Jimenez, Gustavo
Donohue, Michael
Gessert, Devon
Harless, Kelly
Salazar, Jennifer
Cabrera, Yuliana
Walter, Sarah
Hergesheimer, Lindsey
Beckett, Laurel
Harvey, Danielle
Donohue, Michael
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (02) : 504 - 514
[9] Generalized Multiple Importance Sampling
Elvira, Victor
Martino, Luca
Luengo, David
Bugallo, Monica F.
[J]. STATISTICAL SCIENCE, 2019, 34 (01) : 129 - 155
[10] Probabilistic machine learning and artificial intelligence
Ghahramani, Zoubin
[J]. NATURE, 2015, 521 (7553) : 452 - 459

← 1 2 3 4 5 →