On the generation of random multivariate data

被引:9
作者
Camacho, Jose [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Multivariate data; Simulation; ADICOV; MEDA toolbox; Montecarlo; CROSS-VALIDATION; SPARSE METHODS; MODELS; CLASSIFICATION; COMPONENTS; NUMBER; PLS;
D O I
10.1016/j.chemolab.2016.11.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The simulation of multivariate data is often necessary for assessing the performance of multivariate analysis techniques. The random generation of multivariate data when the covariance matrix is completely or partly specified is solved by different methods, from the Cholesky decomposition to some recent alternatives. However, many times the covariance matrix has to be generated also at random, so that the data simulation spans different situations from highly correlated to uncorrelated data. This is the case when assessing a new multivariate analysis techniqfie in Montecarlo experiments. In this paper, we introduce a new algorithm for the generation of random data from covariance matrices of random structure, where the user only decides the data dimension and the level of correlation. We will illustrate the application of this algorithm in several relevant problems in multivariate analysis, namely the selection of the number of Principal Components in Principal Component Analysis, the evaluation of the performance of sparse Partial Least Squares and the calibration of Multivariate Statistical Process Control systems. The algorithm is available as part of the MEDA Toolbox v1.1.(1)
引用
收藏
页码:40 / 51
页数:12
相关论文
共 23 条
[1]   Building covariance matrices with the desired structure [J].
Arteaga, Francisco ;
Ferrer, Alberto .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 127 :80-88
[2]   How to simulate normal data sets with the desired correlation structure [J].
Arteaga, Francisco ;
Ferrer, Alberto .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 101 (01) :38-42
[3]   Cross-validation of component models: A critical look at current methods [J].
Bro, R. ;
Kjeldahl, K. ;
Smilde, A. K. ;
Kiers, H. A. L. .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2008, 390 (05) :1241-1251
[4]   Practical comparison of sparse methods for classification of Arabica and Robusta coffee species using near infrared hyperspectral imaging [J].
Calvini, Rosalba ;
Ulrici, Alessandro ;
Amigo, Jose Manuel .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 :503-511
[5]  
Camacho J., CROSS VALIDATION PCA
[6]   Online monitoring of batch processes using multi-phase principal component analysis [J].
Camacho, Jose ;
Pico, Jesus .
JOURNAL OF PROCESS CONTROL, 2006, 16 (10) :1021-1035
[7]   Multivariate Exploratory Data Analysis (MEDA) Toolbox for Matlab [J].
Camacho, Jose ;
Perez-Villegas, Alejandro ;
Rodriguez-Gomez, Rafael A. ;
Jimenez-Manas, Elena .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 143 :49-57
[8]   Visualizing Big data with Compressed Score Plots: Approach and research challenges [J].
Camacho, Jose .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 135 :110-125
[9]   Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects [J].
Camacho, Jose ;
Ferrer, Alberto .
JOURNAL OF CHEMOMETRICS, 2012, 26 (07) :361-373
[10]   Least-squares approximation of a space distribution for a given covariance and latent sub-space [J].
Camacho, Jose ;
Padilla, Pablo ;
Diaz-Verdejo, Jesus ;
Smith, Keith ;
Lovett, David .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 105 (02) :171-180