A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

被引:8
|
作者
Zekic-Susac, Marijana [1 ]
Pfeifer, Sanja [1 ]
Sarlija, Natasa [1 ]
机构
[1] Univ Josip Juraj Strossmayer Osijek, Fac Econ, Osijek, Croatia
来源
BUSINESS SYSTEMS RESEARCH JOURNAL | 2014年 / 5卷 / 03期
关键词
machine learning; support vector machines; artificial neural networks; CART classification trees; k-nearest neighbour; large-dimensional data; cross-validation;
D O I
10.2478/bsrj-2014-0021
中图分类号
F [经济];
学科分类号
02 ;
摘要
Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.
引用
收藏
页码:82 / 96
页数:15
相关论文
共 50 条
  • [41] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [42] Comparison of the Performance of Machine Learning Models in Representing High-Dimensional Free Energy Surfaces and Generating Observables
    Cendagorta, Joseph R.
    Tolpin, Jocelyn
    Schneider, Elia
    Topper, Robert Q.
    Tuckerman, Mark E.
    JOURNAL OF PHYSICAL CHEMISTRY B, 2020, 124 (18): : 3647 - 3660
  • [43] Revisiting Computational Thermodynamics through Machine Learning of High-Dimensional Data
    Srinivasan, Srikant
    Rajan, Krishna
    COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (05) : 22 - 31
  • [44] Two-stage extreme learning machine for high-dimensional data
    Liu, Peng
    Huang, Yihua
    Meng, Lei
    Gong, Siyuan
    Zhang, Guopeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2016, 7 (05) : 765 - 772
  • [45] Asynchronous Parallel, Sparse Approximated SVRG for High-Dimensional Machine Learning
    Shang, Fanhua
    Huang, Hua
    Fan, Jun
    Liu, Yuanyuan
    Liu, Hongying
    Liu, Jianhui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (12) : 5636 - 5648
  • [46] A machine learning based approach towards high-dimensional mediation analysis
    Natha, Tanmay
    Caffoa, Brian
    Wagerb, Tor
    Lindquista, Martin A.
    NEUROIMAGE, 2023, 268
  • [47] Extreme learning machine Cox model for high-dimensional survival analysis
    Wang, Hong
    Li, Gang
    STATISTICS IN MEDICINE, 2019, 38 (12) : 2139 - 2156
  • [48] Distributed Learning of Deep Sparse Neural Networks for High-dimensional Classification
    Garg, Shweta
    Krishnan, R.
    Jagannathan, S.
    Samaranayake, V. A.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1587 - 1592
  • [49] Efficient sampling of constrained high-dimensional theoretical spaces with machine learning
    Jacob Hollingsworth
    Michael Ratz
    Philip Tanedo
    Daniel Whiteson
    The European Physical Journal C, 2021, 81
  • [50] Robust High-Dimensional Factor Models with Applications to Statistical Machine Learning
    Fan, Jianqing
    Wang, Kaizheng
    Zhong, Yiqiao
    Zhu, Ziwei
    STATISTICAL SCIENCE, 2021, 36 (02) : 303 - 327