A classification tool for N-way array based on SIMCA methodology

被引:45
作者
Durante, Caterina [1 ]
Bro, Rasmus [2 ]
Cocchi, Marina [1 ]
机构
[1] Univ Modena & Reggio Emilia, Dept Chem, I-41125 Modena, Italy
[2] Univ Copenhagen, Dept Food Sci, Fac Life Sci, DK-1958 Frederiksberg C, Denmark
关键词
SIMCA; Multi-way classification; Discriminant analysis; Class modelling; PARAFAC; Tucker; PRINCIPAL COMPONENT ANALYSIS; PARALLEL FACTOR-ANALYSIS; MULTIVARIATE CLASSIFICATION; OLIVE OILS; FLUORESCENCE; SPECTROSCOPY; MODELS;
D O I
10.1016/j.chemolab.2010.09.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the literature there are only few papers concerned with classification methods for multi-way arrays. The most common procedure, by far, is to unfold the multi-way data array into an ordinary matrix and then to apply the traditional multivariate tools for classification. As opposed to unfolding the data several possibilities exist for building classification models more directly based on the multi-way structure of the data. As an example, multi-way partial least squares discriminant analysis has been used as a supervised classification method, another alternative that has been investigated is to perform classification using Fisher's LDA or SIMCA on the score matrix from e.g. a PARAFAC or a Tucker model. Despite a few attempts of applying such multi-way classification approaches, no-one has looked into how such models are best built and implemented. In this work, the SIMCA method is extended to three-way arrays. Included in this work is also actual code that will work on general multi-way arrays rather than just three-way arrays. In analogy with two-way SIMCA. a decomposition model is separately built for the multi-way data for each class, using multi-way decomposition method such as PARAFAC or Tucker3. In the choice of the best class dimensionality, i.e. number of latent factors, both the results of cross-validation but mainly the sensitivity/specificity values are evaluated. In order to estimate the class limits for each class model, orthogonal and score distances are considered, and different statistics are implemented and tested to set confidence limits for these two parameters. Classification performance using different definitions of class boundaries and classification rules, including the use of cross-validated residuals and scores is compared. The proposed N-SIMCA methodology and code, besides simulated data sets of varying dimensionality, has been tested on two case studies, concerning food authentication tasks for typical food products. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:73 / 85
页数:13
相关论文
共 39 条
[1]   Multiway analysis of epilepsy tensors [J].
Acar, Evrim ;
Aykut-Bingol, Canan ;
Bingol, Haluk ;
Bro, Rasmus ;
Yener, Buelent .
BIOINFORMATICS, 2007, 23 (13) :I10-I18
[2]  
Albano C, 1981, P S APPL STAT COP JA
[3]   The N-way Toolbox for MATLAB [J].
Andersson, CA ;
Bro, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2000, 52 (01) :1-4
[4]   Screening of oil samples on the basis of excitation-emission room-temperature phosphorescence data and multiway chemometric techniques. Introducing the second-order advantage in a classification study [J].
Arancibia, Juan A. ;
Boschetti, Carlos E. ;
Olivieri, Alejandro C. ;
Escandar, Graciela M. .
ANALYTICAL CHEMISTRY, 2008, 80 (08) :2789-2798
[5]   Multivariate classification and modeling in surface water pollution estimation [J].
Astel, A. ;
Tsakovski, S. ;
Simeonov, V. ;
Reisenhofer, E. ;
Piselli, S. ;
Barbieri, P. .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2008, 390 (05) :1283-1292
[6]   Classification of multiway analytical data based on MOLMAP approach [J].
Ballabio, Davide ;
Consonni, Viviana ;
Todeschini, Roberto .
ANALYTICA CHIMICA ACTA, 2007, 605 (02) :134-146
[7]   Robust classification in high dimensions based on the SIMCA method [J].
Branden, KV ;
Hubert, M .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 79 (1-2) :10-21
[8]   Cross-validation of component models: A critical look at current methods [J].
Bro, R. ;
Kjeldahl, K. ;
Smilde, A. K. ;
Kiers, H. A. L. .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2008, 390 (05) :1241-1251
[9]   A multiway approach to analyze metabonomic data: a study of maize seeds development [J].
Castro, Cecilia ;
Manetti, Cesare .
ANALYTICAL BIOCHEMISTRY, 2007, 371 (02) :194-200
[10]   Selecting among three-mode principal component models of different types and complexities: A numerical convex hull based method [J].
Ceulemans, E ;
Kiers, HAL .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2006, 59 :133-150