An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices

被引:1
作者
Akdemir, Deniz [1 ]
Somo, Mohamed [2 ]
Isidro-Sanchez, Julio [3 ]
机构
[1] Ctr Int Bone Marrow Transplantat Res, Minneapolis, MN 55401 USA
[2] Syngenta Seeds, Junction City, KS 66441 USA
[3] Univ Politecn Madrid, Ctr Biotecnol & Genomica Plantas, Inst Nacl Invest & Tecnol Agr & Alimentaria, Madrid 28223, Spain
关键词
imputation; covariance estimation; expectation-maximization; multi-view data; heterogeneous databases; INTEGRATIVE ANALYSIS; LIKELIHOOD; INFERENCE;
D O I
10.3390/axioms12020161
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.
引用
收藏
页数:17
相关论文
共 39 条
[1]   Combining Partially Overlapping Multi-Omics Data in Databases Using Relationship Matrices [J].
Akdemir, Deniz ;
Knox, Ron ;
Isidro y Sanchez, Julio .
FRONTIERS IN PLANT SCIENCE, 2020, 11
[2]  
Anderson T, 1984, INTRO MULTIVARIATE
[3]   Reconstructing targetable pathways in lung cancer by integrating diverse omics data [J].
Balbin, O. Alejandro ;
Prensner, John R. ;
Sahu, Anirban ;
Yocum, Anastasia ;
Shankar, Sunita ;
Malik, Rohit ;
Fermin, Damian ;
Dhanasekaran, Saravana M. ;
Chandler, Benjamin ;
Thomas, Dafydd ;
Beer, David G. ;
Cao, Xuhong ;
Nesvizhskii, Alexey I. ;
Chinnaiyan, Arul M. .
NATURE COMMUNICATIONS, 2013, 4
[4]  
Becker B.J., 1992, P ANN M AM ED RES AS
[5]   Methods for the integration of multi-omics data: mathematical aspects [J].
Bersanelli, Matteo ;
Mosca, Ettore ;
Remondini, Daniel ;
Giampieri, Enrico ;
Sala, Claudia ;
Castellani, Gastone ;
Milanesi, Luciano .
BMC BIOINFORMATICS, 2016, 17
[6]   COVARIANCE REGULARIZATION BY THRESHOLDING [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (06) :2577-2604
[7]   Sparse estimation of a covariance matrix [J].
Bien, Jacob ;
Tibshirani, Robert J. .
BIOMETRIKA, 2011, 98 (04) :807-820
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
CHEN CF, 1979, J ROY STAT SOC B, V41, P235
[10]   Integrating human omics data to prioritize candidate genes [J].
Chen, Yong ;
Wu, Xuebing ;
Jiang, Rui .
BMC MEDICAL GENOMICS, 2013, 6