Deep collective matrix factorization for augmented multi-view learning

被引:0
作者
Ragunathan Mariappan
Vaibhav Rajan
机构
[1] National University of Singapore,School of Computing
来源
Machine Learning | 2019年 / 108卷
关键词
Collective Matrix Factorization; Deep learning; Augmented multi-view learning; Bayesian optimization; Recommendation; Gene-disease prioritization;
D O I
暂无
中图分类号
学科分类号
摘要
Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent factors which can be restrictive and fails to capture complex non-linear interactions. In this paper, we develop the first deep-learning based method, called dCMF, for unsupervised learning of multiple shared representations, that can model such non-linear interactions, from an arbitrary collection of matrices. We address optimization challenges that arise due to dependencies between shared representations through multi-task Bayesian optimization and design an acquisition function adapted for collective learning of hyperparameters. Our experiments show that dCMF significantly outperforms previous CMF algorithms in integrating heterogeneous data for predictive modeling. Further, on two tasks—recommendation and prediction of gene-disease association—dCMF outperforms state-of-the-art matrix completion algorithms that can utilize auxiliary sources of information.
引用
收藏
页码:1395 / 1420
页数:25
相关论文
共 66 条
[1]  
Bengio Y(2013)Representation learning: A review and new perspectives IEEE Transactions on Pattern Analysis and Machine Intelligence 35 1798-1828
[2]  
Courville A(2012)Random search for hyper-parameter optimization Journal of Machine Learning Research 13 281-305
[3]  
Vincent P(2008)The art and design of genetic screens: RNA interference Nature Reviews Genetics 9 554-438
[4]  
Bergstra J(2000)Geostatistics for natural resources evaluation Technometrics 42 437-852
[5]  
Bengio Y(2019)A survey on network embedding IEEE Transactions on Knowledge and Data Engineering 31 833-30
[6]  
Boutros M(2006)Statistical comparisons of classifiers over multiple data sets Journal of Machine Learning Research 7 1-2664
[7]  
Ahringer J(2007)Genome-wide association studies provide new insights into type 2 diabetes aetiology Nature Reviews Genetics 8 657-507
[8]  
Coburn TC(2004)Canonical correlation analysis: An overview with application to learning methods Neural Computation 16 2639-377
[9]  
Cui P(2006)Reducing the dimensionality of data with neural networks Science 313 504-2130
[10]  
Wang X(1936)Relations between two sets of variates Biometrika 28 321-383