Comparison of the performance of multiclass classifiers in chemical data: Addressing the problem of overfitting with the permutation test

被引:18
作者
de Andrade, Barbara M. [1 ]
de Gois, Jefferson S. [1 ]
Xavier, Vinicius L. [2 ]
Luna, Aderval S. [1 ]
机构
[1] Univ Estado Rio De Janeiro, Grad Program Chem Engn, Rua Sao Francisco Xavier 524, BR-20550900 Rio De Janeiro, RJ, Brazil
[2] Univ Estado Rio De Janeiro, Inst Math & Stat, Rua Sao Francisco Xavier 524, BR-20550900 Rio De Janeiro, RJ, Brazil
关键词
Pattern recognition; Glass; Wine; Overfitting; Permutation test; ACCURACY; SELECTION; MACHINE; MODELS; KAPPA;
D O I
10.1016/j.chemolab.2020.104013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The objective of this work was to apply different pattern recognition techniques in datasets-i.e., the Glass Identification Dataset and the Wine Quality Dataset-commonly used as a chemometric study of cases. In this paper, three types of different classification models were used. The first type was based on discriminant analysis and other linear classification models such as Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Mixture Discriminant Analysis (MDA), and Partial Least Squares Discriminant Analysis (PLS-DA). The second type was based on nonlinear classification models such as Artificial Neural Networks (ANN), Support Vector Machine (SVM) with a radial kernel function, k-Nearest Neighbors (k-NN), Naive Bayes (NB), and Learning Vector Quantization (LVQ). The last type was based on classification trees and rule-based models such as Classification and Regression Tree (CART), Bagging, Random Forest (RF), C5.0, and Generalized Boosted Machine (GBM). The obtained results outperformed the classification concerning works previously published in the literature. The computational experiments show that the LVQ was the one method able to classify all three datasets correctly. The permutation tests were applied to evaluate the occurrences of the overfitting problem. The results showed that the overfitting problem was absent, which was confirmed by the pairwise Wilcoxon signed-rank test.
引用
收藏
页数:7
相关论文
共 34 条
  • [1] Aldayel MS, 2012, 2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII)
  • [2] Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)
    Allouche, Omri
    Tsoar, Asaf
    Kadmon, Ronen
    [J]. JOURNAL OF APPLIED ECOLOGY, 2006, 43 (06) : 1223 - 1232
  • [3] [Anonymous], DATASET EDITING TECH
  • [4] [Anonymous], APPL PREDICTIVE MODE
  • [5] Athitsos V., 2004, COMP VIS PATT REC IE, P45
  • [6] Baratloo A, 2015, EMERGENCY, V3, P48
  • [7] AN ANALYSIS OF TRANSFORMATIONS REVISITED, REBUTTED
    BOX, GEP
    COX, DR
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1982, 77 (377) : 209 - 210
  • [8] Pattern recognition in chemometrics
    Brereton, Richard G.
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 149 : 90 - 96
  • [9] Castro P.P.L., 2001, REV CIENC EXATA, V5, P129
  • [10] Modeling wine preferences by data mining from physicochemical properties
    Cortez, Paulo
    Cerdeira, Antonio
    Almeida, Fernando
    Matos, Telmo
    Reis, Jose
    [J]. DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 547 - 553