A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

被引:8
|
作者
Zekic-Susac, Marijana [1 ]
Pfeifer, Sanja [1 ]
Sarlija, Natasa [1 ]
机构
[1] Univ Josip Juraj Strossmayer Osijek, Fac Econ, Osijek, Croatia
来源
BUSINESS SYSTEMS RESEARCH JOURNAL | 2014年 / 5卷 / 03期
关键词
machine learning; support vector machines; artificial neural networks; CART classification trees; k-nearest neighbour; large-dimensional data; cross-validation;
D O I
10.2478/bsrj-2014-0021
中图分类号
F [经济];
学科分类号
02 ;
摘要
Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.
引用
收藏
页码:82 / 96
页数:15
相关论文
共 50 条
  • [31] High-dimensional role of AI and machine learning in cancer research
    Enrico Capobianco
    British Journal of Cancer, 2022, 126 : 523 - 532
  • [32] Interpretable machine learning for high-dimensional trajectories of aging health
    Farrell, Spencer
    Mitnitski, Arnold
    Rockwood, Kenneth
    Rutenberg, Andrew
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (01)
  • [33] High-Dimensional Multi-trait GWAS By Reverse Prediction of Genotypes Using Machine Learning Methods
    Malik, Muhammad Ammar
    Ludl, Adriaan-Alexander
    Michoel, Tom
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, CIBB 2021, 2022, 13483 : 79 - 93
  • [34] Model-Based Design of Experiments for High-Dimensional Inputs Supported by Machine-Learning Methods
    Seufert, Philipp
    Schwientek, Jan
    Bortz, Michael
    PROCESSES, 2021, 9 (03) : 1 - 25
  • [35] Comparison of biomarker selection methods in high-dimensional genomic data
    Wang, Y.
    Guo, S.
    EUROPEAN JOURNAL OF CANCER, 2022, 174 : S98 - S98
  • [36] A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix
    Hu, Zongliang
    Dong, Kai
    Dai, Wenlin
    Tong, Tiejun
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (02):
  • [37] Classification methods for the development of genomic signatures from high-dimensional data
    Hojin Moon
    Hongshik Ahn
    Ralph L Kodell
    Chien-Ju Lin
    Songjoon Baek
    James J Chen
    Genome Biology, 7
  • [38] Asymptotic behavior of some multicategory classification methods for high-dimensional data
    Garcia-Cerino, Dorilian
    Bolivar-Cime, Addy
    Perez-Abreu, Victor
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [39] Ensemble methods for classification of patients for personalized medicine with high-dimensional data
    Moon, Hojin
    Ahn, Hongshik
    Kodell, Ralph L.
    Baek, Songjoon
    Lin, Chien-Ju
    Chen, James J.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2007, 41 (03) : 197 - 207
  • [40] Classification methods for the development of genomic signatures from high-dimensional data
    Moon, Hojin
    Ahn, Hongshik
    Kodell, Ralph L.
    Lin, Chien-Ju
    Baek, Songjoon
    Chen, James J.
    GENOME BIOLOGY, 2006, 7 (12)