Canonical correlation analysis of multiple sensory directed metabolomics data blocks reveals corresponding parts between data blocks

被引:11
作者
Doeswijk, T. G. [1 ,3 ]
Hageman, J. A. [1 ]
Westerhuis, J. A. [2 ,3 ]
Tikunov, Y. [4 ]
Bovy, A. [4 ]
van Eeuwijk, F. A. [1 ,3 ,4 ]
机构
[1] Wageningen Univ, NL-6708 AC Wageningen, Netherlands
[2] Univ Amsterdam, NL-1098 XH Amsterdam, Netherlands
[3] Netherlands Metab Ctr, NL-2333 CC Leiden, Netherlands
[4] Ctr Biosyst Genom, NL-6700 AB Wageningen, Netherlands
关键词
Metabolomics; Regression; Partial least squares; Canonical correlation analysis; Redundancy; Data fusion; VARIABLE SELECTION; COMPONENT ANALYSIS; MULTIBLOCK; QUALITY; PREDICTION; FUSION; MODELS; PLS;
D O I
10.1016/j.chemolab.2011.05.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiple analytical platforms are frequently used in metabolomics studies. The resulting multiple data blocks contain, in general, similar parts of information which can be disclosed by chemometric methods. The metabolites of interest, however, are usually just a minor part of the complete data block and are related to a response of interest such as quality traits. Concatenation of data matrices is frequently used to simultaneously analyze multiple data blocks. Two main problems may occur with this approach: 1) the number of variables becomes very large in relation to the number of observations which may deteriorate model performance, and 2) scaling issues between the data blocks need to be resolved. Therefore, a method is proposed that circumvents direct concatenation of two data matrices but does uncover the shared and distinct parts of the data sets in relation to quality traits. The relevant part of the data blocks with respect to the quality trait of interest is revealed by partial least squares regression on each of the data blocks. The score vectors of both models that are predictive for the quality trait are then used in a canonical correlation analysis. Highly correlating score vectors indicate parts of the data blocks that are closely related. By inspecting the relevant loading vectors, the metabolites of interest are revealed. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:371 / 376
页数:6
相关论文
共 27 条
[1]   Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data [J].
Brereton, Richard G. .
TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2006, 25 (11) :1103-1111
[2]   Use of network analysis to capture key traits affecting tomato organoleptic quality [J].
Carli, Paola ;
Arima, Serena ;
Fogliano, Vincenzo ;
Tardella, Luca ;
Frusciante, Luigi ;
Ercolano, Maria R. .
JOURNAL OF EXPERIMENTAL BOTANY, 2009, 60 (12) :3379-3386
[3]   A generalization of principal component analysis to K sets of variables [J].
Casin, P .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2001, 35 (04) :417-428
[4]   SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[5]   An introduction to Multi-block Component Analysis by means of a flavor language case study [J].
Derks, EPPA ;
Westerhuis, JA ;
Smilde, AK ;
King, BM .
FOOD QUALITY AND PREFERENCE, 2003, 14 (5-6) :497-506
[6]   Evaluation of different techniques for data fusion of LC/MS and 1H-NMR [J].
Forshed, Jenny ;
Idborg, Helena ;
Jacobsson, Sven P. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2007, 85 (01) :102-109
[7]   GENERALIZED PROCRUSTES ANALYSIS [J].
GOWER, JC .
PSYCHOMETRIKA, 1975, 40 (01) :33-51
[8]   Enriched biplots for canonical correlation analysis [J].
Graffelman, J .
JOURNAL OF APPLIED STATISTICS, 2005, 32 (02) :173-188
[9]  
HAGEMAN J, 2010, EUPHYTICA, V1
[10]   Wavelength selection with Tabu Search [J].
Hageman, JA ;
Streppel, M ;
Wehrens, R ;
Buydens, LMC .
JOURNAL OF CHEMOMETRICS, 2003, 17 (8-9) :427-437