Data Fusion by Matrix Factorization

被引:156
作者
Zitnik, Marinka [1 ]
Zupan, Blaz [1 ,2 ]
机构
[1] Univ Ljubljana, Fac Comp & Informat Sci, SI-1000 Ljubljana, Slovenia
[2] Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA
关键词
Data fusion; intermediate data integration; matrix factorization; data mining; bioinformatics; cheminformatics; INFORMATION FUSION; DISCOVERY; NETWORKS; KNOWLEDGE; FRAMEWORK; MODULES; CANCER;
D O I
10.1109/TPAMI.2014.2343973
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with contextual data and data about system's constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data that can be expressed in a matrix, including those from feature-based representations, ontologies, associations and networks. We demonstrate the utility of DFMF for gene function prediction task with eleven different data sources and for prediction of pharmacologic actions by fusing six data sources. Our data fusion algorithm compares favorably to alternative data integration approaches and achieves higher accuracy than can be obtained from any single data source alone.
引用
收藏
页码:41 / 53
页数:13
相关论文
共 68 条
[31]  
Lanckriet GRG, 2004, J MACH LEARN RES, V5, P27
[32]   A statistical framework for genomic data fusion [J].
Lanckriet, GRG ;
De Bie, T ;
Cristianini, N ;
Jordan, MI ;
Noble, WS .
BIOINFORMATICS, 2004, 20 (16) :2626-2635
[33]   DrugBank 4.0: shedding new light on drug metabolism [J].
Law, Vivian ;
Knox, Craig ;
Djoumbou, Yannick ;
Jewison, Tim ;
Guo, An Chi ;
Liu, Yifeng ;
Maciejewski, Adam ;
Arndt, David ;
Wilson, Michael ;
Neveu, Vanessa ;
Tang, Alexandra ;
Gabriel, Geraldine ;
Ly, Carol ;
Adamjee, Sakina ;
Dame, Zerihun T. ;
Han, Beomsoo ;
Zhou, You ;
Wishart, David S. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D1091-D1097
[34]  
Lee D. D., 2000, PROC INT C ADV NEURA, P535, DOI DOI 10.1186/GB-2013-14-4-R39
[35]  
Leskovec J, 2010, CHI2010: PROCEEDINGS OF THE 28TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1361
[36]  
Li W.-j., 2007, P 20 INT JOINT C ART, P1126
[37]   Generalized spatial dynamic factor models [J].
Lopes, Hedibert Freitas ;
Gamerman, Dani ;
Salazar, Esther .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (03) :1319-1330
[38]  
Luttinen J., 2009, Advances in Neural Information Processing Systems, P1177
[39]   Combining many interaction networks to predict gene function and analyze gene lists [J].
Mostafavi, Sara ;
Morris, Quaid .
PROTEOMICS, 2012, 12 (10) :1687-1696
[40]  
Nickel Maximilian, 2011, P 28 INT C INT C MAC, P809