Simple and Effective Way for Data Preprocessing Selection Based on Design of Experiments

被引:152
作者
Gerretzen, Jan [1 ,2 ]
Szymanska, Ewa [1 ,2 ]
Jansen, Jeroen J. [1 ]
Bart, Jacob [3 ]
van Manen, Henk-Jan [3 ]
van den Heuvel, Edwin R. [4 ]
Buydens, Lutgarde M. C. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Mol & Mat, NL-6500 GL Nijmegen, Netherlands
[2] TI COAST, NL-1098 XH Amsterdam, Netherlands
[3] AkzoNobel, Supply Chain, Res & Dev, NL-7418 AJ Deventer, Netherlands
[4] Eindhoven Univ Technol, NL-5600 MB Eindhoven, Netherlands
关键词
PARTIAL LEAST-SQUARES; MULTIPLICATIVE SIGNAL CORRECTION; NEAR-INFRARED SPECTRA; MULTIVARIATE CALIBRATION; AQUEOUS-SOLUTIONS; NIR SPECTROSCOPY; METABOLOMICS; SPECTROMETRY; REGRESSION; HYDROXIDE;
D O I
10.1021/acs.analchem.5b02832
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The selection of optimal preprocessing is among the main bottlenecks in chemometric data analysis. Preprocessing currently is a burden, since a multitude of different preprocessing methods is available for, e.g., baseline correction, smoothing, and alignment, but it is not clear beforehand which method(s) should be used for which data set. The process of preprocessing selection is often limited to trial-and-error and is therefore considered somewhat subjective. In this paper, we present a novel, simple, and effective approach for preprocessing selection. The defining feature of this approach is a design of experiments. On the basis of the design, model performance of a few well-chosen preprocessing methods, and combinations thereof (called strategies) is evaluated. Interpretation of the main effects and interactions subsequently enables the selection of an optimal preprocessing strategy. The presented approach is applied to eight different spectroscopic data sets, covering both calibration and classification challenges. We show that the approach is able to select a preprocessing strategy which improves model performance by at least 50% compared to the raw data; in most cases, it leads to a strategy very close to the true optimum. Our approach makes preprocessing selection fast, insightful, and objective.
引用
收藏
页码:12096 / 12103
页数:8
相关论文
共 34 条
[1]  
[Anonymous], 1988, J. Chemom, DOI DOI 10.1002/CEM.1180020106
[2]   STANDARD NORMAL VARIATE TRANSFORMATION AND DE-TRENDING OF NEAR-INFRARED DIFFUSE REFLECTANCE SPECTRA [J].
BARNES, RJ ;
DHANOA, MS ;
LISTER, SJ .
APPLIED SPECTROSCOPY, 1989, 43 (05) :772-777
[3]   Partial least squares discriminant analysis: taking the magic away [J].
Brereton, Richard G. ;
Lloyd, Gavin R. .
JOURNAL OF CHEMOMETRICS, 2014, 28 (04) :213-225
[4]   Theory and application of near infrared reflectance spectroscopy in determination of food quality [J].
Cen, Haiyan ;
He, Yong .
TRENDS IN FOOD SCIENCE & TECHNOLOGY, 2007, 18 (02) :72-83
[5]   Determination of total polyphenols content in green tea using FT-NIR spectroscopy and different PLS algorithms [J].
Chen, Quansheng ;
Zhao, Jiewen ;
Liu, Muhua ;
Cai, Jianrong ;
Liu, Jianhua .
JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2008, 46 (03) :568-573
[6]   Performance of some variable selection methods when multicollinearity is present [J].
Chong, IG ;
Jun, CH .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 78 (1-2) :103-112
[7]   Start-to-end processing of two-dimensional gel electrophoretic images [J].
Daszykowski, M. ;
Stanimirova, I. ;
Bodzon-Kulakowska, A. ;
Silberring, J. ;
Lubec, G. ;
Walczak, B. .
JOURNAL OF CHROMATOGRAPHY A, 2007, 1158 (1-2) :306-317
[8]   SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[9]   Parametric time warping [J].
Eilers, PHC .
ANALYTICAL CHEMISTRY, 2004, 76 (02) :404-411
[10]   A perfect smoother [J].
Eilers, PHC .
ANALYTICAL CHEMISTRY, 2003, 75 (14) :3631-3636