Comparison of the performance of multiclass classifiers in chemical data: Addressing the problem of overfitting with the permutation test

被引:18
作者
de Andrade, Barbara M. [1 ]
de Gois, Jefferson S. [1 ]
Xavier, Vinicius L. [2 ]
Luna, Aderval S. [1 ]
机构
[1] Univ Estado Rio De Janeiro, Grad Program Chem Engn, Rua Sao Francisco Xavier 524, BR-20550900 Rio De Janeiro, RJ, Brazil
[2] Univ Estado Rio De Janeiro, Inst Math & Stat, Rua Sao Francisco Xavier 524, BR-20550900 Rio De Janeiro, RJ, Brazil
关键词
Pattern recognition; Glass; Wine; Overfitting; Permutation test; ACCURACY; SELECTION; MACHINE; MODELS; KAPPA;
D O I
10.1016/j.chemolab.2020.104013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The objective of this work was to apply different pattern recognition techniques in datasets-i.e., the Glass Identification Dataset and the Wine Quality Dataset-commonly used as a chemometric study of cases. In this paper, three types of different classification models were used. The first type was based on discriminant analysis and other linear classification models such as Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Mixture Discriminant Analysis (MDA), and Partial Least Squares Discriminant Analysis (PLS-DA). The second type was based on nonlinear classification models such as Artificial Neural Networks (ANN), Support Vector Machine (SVM) with a radial kernel function, k-Nearest Neighbors (k-NN), Naive Bayes (NB), and Learning Vector Quantization (LVQ). The last type was based on classification trees and rule-based models such as Classification and Regression Tree (CART), Bagging, Random Forest (RF), C5.0, and Generalized Boosted Machine (GBM). The obtained results outperformed the classification concerning works previously published in the literature. The computational experiments show that the LVQ was the one method able to classify all three datasets correctly. The permutation tests were applied to evaluate the occurrences of the overfitting problem. The results showed that the overfitting problem was absent, which was confirmed by the pairwise Wilcoxon signed-rank test.
引用
收藏
页数:7
相关论文
共 34 条
  • [21] Feature selection for aiding glass forensic evidence analysis
    Jensen, Richard
    Shen, Qiang
    [J]. INTELLIGENT DATA ANALYSIS, 2009, 13 (05) : 703 - 723
  • [22] Building Predictive Models in R Using the caret Package
    Kuhn, Max
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2008, 28 (05): : 1 - 26
  • [23] Chemometrics in forensic science
    Kumar, Raj
    Sharma, Vishal
    [J]. TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2018, 105 : 191 - 201
  • [24] Interrater reliability: the kappa statistic
    McHugh, Mary L.
    [J]. BIOCHEMIA MEDICA, 2012, 22 (03) : 276 - 282
  • [25] Morais E.C., 2010, RECONHECIMENTO PADRO, P135
  • [26] Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio
    Nababan, A. A.
    Sitompul, O. S.
    Tulus
    [J]. INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONICS, COMPUTER, AND INDUSTRIAL TECHNOLOGY, 2018, 1007
  • [27] A composite Bayesian hierarchical model of compositional data with zeros
    Napier, Gary
    Neocleous, Tereza
    Nobile, Agostino
    [J]. JOURNAL OF CHEMOMETRICS, 2015, 29 (02) : 96 - 108
  • [28] A review of learning vector quantization classifiers
    Nova, David
    Estevez, Pablo A.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2014, 25 (3-4) : 511 - 524
  • [29] On lines and planes of closest fit to systems of points in space.
    Pearson, Karl
    [J]. PHILOSOPHICAL MAGAZINE, 1901, 2 (7-12) : 559 - 572
  • [30] Salim A., 2015, APLICACAO TECNICAS R, DOI [10.13140/RG.2.1.2246.1606., DOI 10.13140/RG.2.1.2246.1606]