Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance

被引:38
作者
Tange, Rikke Ingemann [1 ]
Rasmussen, Morten Arendt [1 ,2 ]
Taira, Eizo [3 ]
Bro, Rasmus [1 ]
机构
[1] Univ Copenhagen, Dept Food Sci, Fac Sci, Frederiksberg, Denmark
[2] Copenhagen Univ Hosp, Copenhagen Prospect Studies Asthma Childhood, Copenhagen, Denmark
[3] Univ Ryukyus, Fac Agr, Nishihara, Okinawa, Japan
关键词
Sample size; prediction performance; cross-validation; high dimensional data; artificial neural network; partial least squares regression; support vector regresssion; near infrared; PREDICTION; MACHINES;
D O I
10.1177/0967033517734945
中图分类号
O69 [应用化学];
学科分类号
081704 ;
摘要
It has become easy to obtain multivariate chemical data of high dimensions. However, it may be expensive or time consuming to obtain a large number of samples or to acquire reference measures, so the number of samples available for multivariate calibration modelling may be limited. If data contains nonlinear relationships, nonlinear methods are required for the calibration task. The combination of limited amounts of data of high dimensions and highly flexible nonlinear methods may result in overfitted models which in turn perform badly on new data. Therefore, for real world applications, it is desirable to understand how the sample size affects model prediction performance. For this purpose, we compared partial least squares regression, artificial neural network, and support vector regression applied to three real world nonlinear datasets of which two were of high dimensions. We evaluated the effect of calibration sample size (i) on test set performance, including variation in test set performance due to sampling variation and (ii) tested if the cross-validated performance was adequate for assessing the predictive ability. We demonstrated the applicability of artificial neural network and support vector regression for real world data of limited size and showed that support vector regression had advantages over artificial neural network: (i) fewer calibration samples were required to obtain a desired model performance, (ii) support vector regression was less sensitive to sampling variation for small sample sets and (iii) cross-validation was an approximately unbiased option for evaluating the true support vector regression model performance even for small sample sets.
引用
收藏
页码:381 / 390
页数:10
相关论文
共 35 条
  • [1] Analytical modeling and simulation of I-V characteristics in carbon nanotube based gas sensors using ANN and SVR methods
    Akbari, Elnaz
    Buntat, Zolkafle
    Enzevaee, Aria
    Ebrahimi, Monireh
    Yazdavar, Amir Hossein
    Yusof, Rubiyah
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 137 : 173 - 180
  • [2] Support vector regression to predict porosity and permeability: Effect of sample size
    Al-Anazi, A. F.
    Gates, I. D.
    [J]. COMPUTERS & GEOSCIENCES, 2012, 39 : 64 - 76
  • [3] [Anonymous], MELTING POINT MODEL
  • [4] Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data
    Balabin, Roman M.
    Smirnov, Sergey V.
    [J]. ANALYST, 2012, 137 (07) : 1604 - 1610
  • [5] Support vector machine regression (SVR/LS-SVM)-an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data
    Balabin, Roman M.
    Lomakina, Ekaterina I.
    [J]. ANALYST, 2011, 136 (08) : 1703 - 1712
  • [6] A flexible classification approach with optimal generalisation performance: support vector machines
    Belousov, AI
    Verzakov, SA
    von Frese, J
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 64 (01) : 15 - 25
  • [7] Determination of olive oil free fatty acid by Fourier transform infrared spectroscopy
    Bertran, E
    Blanco, M
    Coello, J
    Iturriaga, H
    Maspoch, S
    Montoliu, I
    [J]. JOURNAL OF THE AMERICAN OIL CHEMISTS SOCIETY, 1999, 76 (05) : 611 - 616
  • [8] Predicting protein-protein interactions from primary structure
    Bock, JR
    Gough, DA
    [J]. BIOINFORMATICS, 2001, 17 (05) : 455 - 460
  • [9] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [10] Drug design by machine learning: support vector machines for pharmaceutical data analysis
    Burbidge, R
    Trotter, M
    Buxton, B
    Holden, S
    [J]. COMPUTERS & CHEMISTRY, 2001, 26 (01): : 5 - 14