BAMITA: Bayesian multiple imputation for tensor arrays
被引:0
|
作者:
Jiang, Ziren
论文数: 0引用数: 0
h-index: 0
机构:
Univ Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USAUniv Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
Jiang, Ziren
[1
]
Li, Gen
论文数: 0引用数: 0
h-index: 0
机构:
Univ Michigan, Sch Publ Hlth, Dept Biostat, 1415 Washington Hts,M4210, Ann Arbor, MI 48109 USAUniv Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
Li, Gen
[2
]
Lock, Eric F.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USAUniv Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
Lock, Eric F.
[1
]
机构:
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
[2] Univ Michigan, Sch Publ Hlth, Dept Biostat, 1415 Washington Hts,M4210, Ann Arbor, MI 48109 USA
Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation.