Multi-omics integration-a comparison of unsupervised clustering methodologies

被引:94
作者
Tini, Giulia [1 ,2 ]
Marchetti, Luca [3 ,4 ]
Priami, Corrado [5 ]
Scott-Boyer, Marie-Pier
机构
[1] Univ Trento, Math, Trento, Italy
[2] COSBI, Trento, Italy
[3] Univ Verona, Verona, Italy
[4] COSBI, Computat Biol Team, Trento, Italy
[5] Univ Trento, Comp Sci, Trento, Italy
关键词
molecular-level interaction; biological systems; unsupervised classification; data preprocessing; JOINT; DISCOVERY; MODULES; BREAST; ONPLS;
D O I
10.1093/bib/bbx167
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With the recent developments in the field of multi-omics integration, the interest in factors such as data preprocessing, choice of the integration method and the number of different omics considered had increased. In this work, the impact of these factors is explored when solving the problem of sample classification, by comparing the performances of five unsupervised algorithms: Multiple Canonical Correlation Analysis, Multiple Co-Inertia Analysis, Multiple Factor Analysis, Joint and Individual Variation Explained and Similarity Network Fusion. These methods were applied to three real data sets taken from literature and several ad hoc simulated scenarios to discuss classification performance in different conditions of noise and signal strength across the data types. The impact of experimental design, feature selection and parameter training has been also evaluated to unravel important conditions that can affect the accuracy of the result.
引用
收藏
页码:1269 / 1279
页数:11
相关论文
共 69 条
[1]   Highlighting nonlinear patterns in population genetics datasets [J].
Alanis-Lobato, Gregorio ;
Cannistraci, Carlo Vittorio ;
Eriksson, Anders ;
Manica, Andrea ;
Ravasi, Timothy .
SCIENTIFIC REPORTS, 2015, 5
[2]  
[Anonymous], 1999, P KDD99 1 ANN INT C, DOI [10.1145/312129.312186, DOI 10.1145/312129.312186]
[3]  
[Anonymous], 2004, Rev. Colomb. Estadistica
[4]  
[Anonymous], 2011, BIOINFORMATICS
[5]  
[Anonymous], J DOC
[6]  
[Anonymous], 1966, Multivariate Analysis
[7]   Methods for the integration of multi-omics data: mathematical aspects [J].
Bersanelli, Matteo ;
Mosca, Ettore ;
Remondini, Daniel ;
Giampieri, Enrico ;
Sala, Claudia ;
Castellani, Gastone ;
Milanesi, Luciano .
BMC BIOINFORMATICS, 2016, 17
[8]   Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding [J].
Cannistraci, Carlo Vittorio ;
Alanis-Lobato, Gregorio ;
Ravasi, Timothy .
BIOINFORMATICS, 2013, 29 (13) :199-209
[9]   Systems medicine of inflammaging [J].
Castellani, Gastone C. ;
Menichetti, Giulia ;
Garagnani, Paolo ;
Bacalini, Maria Giulia ;
Pirazzini, Chiara ;
Franceschi, Claudio ;
Collino, Sebastiano ;
Sala, Claudia ;
Remondini, Daniel ;
Giampieri, Enrico ;
Mosca, Ettore ;
Bersanelli, Matteo ;
Vitali, Silvia ;
do Valle, Italo Faria ;
Lio, Pietro ;
Milanesi, Luciano .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (03) :527-540
[10]   Transcriptomic and metabolomic data integration [J].
Cavill, Rachel ;
Jennen, Danyel ;
Kleinjans, Jos ;
Briede, Jacob Jan .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (05) :891-901