Data fusion using factor analysis and low-rank matrix completion

被引:0
|
作者
Ahfock, Daniel [1 ]
Pyne, Saumyadipta [2 ,3 ,4 ]
McLachlan, Geoffrey J. [1 ]
机构
[1] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[2] Univ Pittsburgh, Grad Sch Publ Hlth, Publ Hlth Dynam Lab, Pittsburgh, PA USA
[3] Univ Pittsburgh, Grad Sch Publ Hlth, Dept Biostat, Pittsburgh, PA 15261 USA
[4] Hlth Analyt Network, Pittsburgh, PA USA
基金
澳大利亚研究理事会;
关键词
Data fusion; Statistical file-matching; Low-rank matrix completion; Factor analysis; ALGORITHMS; NUMBER;
D O I
10.1007/s11222-021-10033-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data fusion involves the integration of multiple related datasets. The statistical file-matching problem is a canonical data fusion problem in multivariate analysis, where the objective is to characterise the joint distribution of a set of variables when only strict subsets of marginal distributions have been observed. Estimation of the covariance matrix of the full set of variables is challenging given the missing-data pattern. Factor analysis models use lower-dimensional latent variables in the data-generating process, and this introduces low-rank components in the complete-data matrix and the population covariance matrix. The low-rank structure of the factor analysis model can be exploited to estimate the full covariance matrix from incomplete data via low-rank matrix completion. We prove the identifiability of the factor analysis model in the statistical file-matching problem under conditions on the number of factors and the number of shared variables over the observed marginal subsets. Additionally, we provide an EM algorithm for parameter estimation. On several real datasets, the factor model gives smaller reconstruction errors in file-matching problems than the common approaches for low-rank matrix completion.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Data fusion using factor analysis and low-rank matrix completion
    Daniel Ahfock
    Saumyadipta Pyne
    Geoffrey J. McLachlan
    Statistics and Computing, 2021, 31
  • [2] Low-Rank Matrix Completion
    Chi, Yuejie
    IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (05) : 178 - 181
  • [3] Low-rank Matrix Completion using Alternating Minimization
    Jain, Prateek
    Netrapalli, Praneeth
    Sanghavi, Sujay
    STOC'13: PROCEEDINGS OF THE 2013 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2013, : 665 - 674
  • [4] Reflection Removal Using Low-Rank Matrix Completion
    Han, Byeong-Ju
    Sim, Jae-Young
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3872 - 3880
  • [5] A Converse to Low-Rank Matrix Completion
    Pimentel-Alarcon, Daniel L.
    Nowak, Robert D.
    2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 96 - 100
  • [6] DECENTRALIZED LOW-RANK MATRIX COMPLETION
    Ling, Qing
    Xu, Yangyang
    Yin, Wotao
    Wen, Zaiwen
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2925 - 2928
  • [7] Adaptive Low-Rank Matrix Completion
    Tripathi, Ruchi
    Mohan, Boda
    Rajawat, Ketan
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (14) : 3603 - 3616
  • [8] Rank Determination for Low-Rank Data Completion
    Ashraphijuo, Morteza
    Wang, Xiaodong
    Aggarwal, Vaneet
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [9] Rank determination for low-rank data completion
    1600, Microtome Publishing (18):
  • [10] Gene expression prediction using low-rank matrix completion
    Kapur, Arnav
    Marwah, Kshitij
    Alterovitz, Gil
    BMC BIOINFORMATICS, 2016, 17