Unsupervised data imputation with multiple importance sampling variational autoencoders

被引:0
作者
Kuang, Shenfen [1 ]
Huang, Yewen [2 ]
Song, Jie [1 ]
机构
[1] Shaoguan Univ, Sch Math & Stat, Shaoguan 512005, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Elect & Informat, Guangzhou 510665, Peoples R China
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
Missing data; Variational autoencoders; Multiple importance sampling; Resampling; MISSING DATA IMPUTATION;
D O I
10.1038/s41598-025-87641-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recently, deep latent variable models have made significant progress in dealing with missing data problems, benefiting from their ability to capture intricate and non-linear relationships within the data. In this work, we further investigate the potential of Variational Autoencoders (VAEs) in addressing the uncertainty associated with missing data via a multiple importance sampling strategy. We propose a Missing data Multiple Importance Sampling Variational Auto-Encoder (MMISVAE) method to effectively model incomplete data. Our approach consists of a learning step and an imputation step. During the learning step, the mixture components are represented by multiple separate encoder networks, which are later combined through simple averaging to enhance the latent representation capabilities of the VAEs when dealing with incomplete data. The statistical model and variational distributions are iteratively updated by maximizing the Multiple Importance Sampling Evidence Lower Bound (MISELBO) on the joint log-likelihood. In the imputation step, missing data is estimated using conditional expectation through multiple importance resampling. We propose an efficient imputation algorithm that broadens the scope of Missing data Importance Weighted Auto-Encoder (MIWAE) by incorporating multiple proposal probability distributions and the resampling schema. One notable characteristic of our method is the complete unsupervised nature of both the learning and imputation processes. Through comprehensive experimental analysis, we present evidence of the effectiveness of our method in improving the imputation accuracy of incomplete data when compared to current state-of-the-art VAEs-based methods.
引用
收藏
页数:16
相关论文
共 42 条
  • [1] Variational Inference: A Review for Statisticians
    Blei, David M.
    Kucukelbir, Alp
    McAuliffe, Jon D.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
  • [2] Boquet G, 2019, INT CONF ACOUST SPEE, P2882, DOI [10.1109/icassp.2019.8683011, 10.1109/ICASSP.2019.8683011]
  • [3] Adaptive Importance Sampling The past, the present, and the future
    Bugallo, Monica F.
    Elvira, Victor
    Martino, Luca
    Luengo, David
    Miguez, Joaquin
    Djuric, Petar M.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (04) : 60 - 79
  • [4] Burda Y., 2015, INT C LEARN REPR
  • [5] Deep Generative Imputation Model for Missing Not At Random Data
    Chen, Jialei
    Xu, Yuanbo
    Wang, Pengyang
    Yang, Yongjian
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 316 - 325
  • [6] Adaptive Multiple Importance Sampling
    Cornuet, Jean-Marie
    Marin, Jean-Michel
    Mira, Antonietta
    Robert, Christian P.
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2012, 39 (04) : 798 - 812
  • [7] Diffusion Models in Vision: A Survey
    Croitoru, Florinel-Alin
    Hondru, Vlad
    Ionescu, Radu Tudor
    Shah, Mubarak
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10850 - 10869
  • [8] Medical Image Imputation From Image Collections
    Dalca, Adrian V.
    Bouman, Katherine L.
    Freeman, William T.
    Rost, Natalia S.
    Sabuncu, Mert R.
    Golland, Polina
    Weiner, Michael W.
    Aisen, Paul
    Weiner, Michael
    Aisen, Paul
    Petersen, Ronald
    Jack, Clifford R., Jr.
    Jagust, William
    Trojanowki, John Q.
    Toga, Arthur W.
    Beckett, Laurel
    Green, Robert C.
    Saykin, Andrew J.
    Morris, John
    Shaw, Leslie M.
    Khachaturian, Zaven
    Sorensen, Greg
    Carrillo, Maria
    Kuller, Lew
    Raichle, Marc
    Paul, Steven
    Davies, Peter
    Fillit, Howard
    Hefti, Franz
    Holtzman, David
    Mesulam, M. Marcel
    Potter, William
    Snyder, Peter
    Lilly, Eli
    Logovinsky, Veronika
    Green, Robert C.
    Montine, Tom
    Petersen, Ronald
    Aisen, Paul
    Jimenez, Gustavo
    Donohue, Michael
    Gessert, Devon
    Harless, Kelly
    Salazar, Jennifer
    Cabrera, Yuliana
    Walter, Sarah
    Hergesheimer, Lindsey
    Beckett, Laurel
    Harvey, Danielle
    Donohue, Michael
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (02) : 504 - 514
  • [9] Generalized Multiple Importance Sampling
    Elvira, Victor
    Martino, Luca
    Luengo, David
    Bugallo, Monica F.
    [J]. STATISTICAL SCIENCE, 2019, 34 (01) : 129 - 155
  • [10] Probabilistic machine learning and artificial intelligence
    Ghahramani, Zoubin
    [J]. NATURE, 2015, 521 (7553) : 452 - 459