Structure-revealing data fusion

被引:85
作者
Acar, Evrim [1 ]
Papalexakis, Evangelos E. [2 ]
Gurdeniz, Gozde [3 ]
Rasmussen, Morten A. [1 ]
Lawaetz, Anders J. [1 ]
Nilsson, Mathias [1 ,4 ]
Bro, Rasmus [1 ]
机构
[1] Univ Copenhagen, Fac Sci, Dept Food Sci, Frederiksberg C, Denmark
[2] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[3] Univ Copenhagen, Fac Sci, Dept Nutr Exercise & Sports, Frederiksberg C, Denmark
[4] Univ Manchester, Sch Chem, Manchester M13 9PL, Lancs, England
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
Data fusion; Coupled matrix and tensor factorizations; Optimization; Sparsity; NMR; DOSY; MS; SPECTROMETRY DATA; MULTIWAY ANALYSIS; SPECTROSCOPY; COMPONENT; PARAFAC; MULTIBLOCK; FRAMEWORK; JOINT; SETS; PCA;
D O I
10.1186/1471-2105-15-239
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. Results: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. Conclusions: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.
引用
收藏
页数:17
相关论文
共 76 条
  • [1] Acar E, 2012, INT J KNOWL DISCOV B, V3, P22
  • [2] Acar E., 2011, KDD WORKSH MIN LEARN
  • [3] Acar E, 2014, EUSIPCO 14
  • [4] Unsupervised Multiway Data Analysis: A Literature Survey
    Acar, Evrim
    Yener, Buelent
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (01) : 6 - 20
  • [5] Understanding data fusion within the framework of coupled matrix and tensor factorizations
    Acar, Evrim
    Rasmussen, Morten Arendt
    Savorani, Francesco
    Naes, Tormod
    Bro, Rasmus
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 129 : 53 - 63
  • [6] Acar E, 2013, IEEE ENG MED BIO, P6023, DOI 10.1109/EMBC.2013.6610925
  • [7] Coupled Analysis of In Vitro and Histology Tissue Samples to Quantify Structure-Function Relationship
    Acar, Evrim
    Plopper, George E.
    Yener, Buelent
    [J]. PLOS ONE, 2012, 7 (03):
  • [8] Scalable tensor factorizations for incomplete data
    Acar, Evrim
    Dunlavy, Daniel M.
    Kolda, Tamara G.
    Morup, Morten
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 106 (01) : 41 - 56
  • [9] Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) : 3351 - 3356
  • [10] [Anonymous], 2008, P 14 ACM SIGKDD INT