A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

被引:9
作者
Zekic-Susac, Marijana [1 ]
Pfeifer, Sanja [1 ]
Sarlija, Natasa [1 ]
机构
[1] Univ Josip Juraj Strossmayer Osijek, Fac Econ, Osijek, Croatia
来源
BUSINESS SYSTEMS RESEARCH JOURNAL | 2014年 / 5卷 / 03期
关键词
machine learning; support vector machines; artificial neural networks; CART classification trees; k-nearest neighbour; large-dimensional data; cross-validation;
D O I
10.2478/bsrj-2014-0021
中图分类号
F [经济];
学科分类号
02 ;
摘要
Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.
引用
收藏
页码:82 / 96
页数:15
相关论文
共 37 条
[31]   Converging measurement of horizontal and vertical individualism and collectivism [J].
Triandis, HC ;
Gelfand, MJ .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1998, 74 (01) :118-128
[32]  
Witten I. H., 1999, DATA MINING PRACTICA
[33]   A hybrid approach of DEA, rough set and support vector machines for business failure prediction [J].
Yeh, Ching-Chiang ;
Chi, Der-Jang ;
Hsu, Ming-Fu .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (02) :1535-1541
[34]  
Yu H., 2003, P 9 ACM SIGKDD INT C, P306, DOI [10.1145/956750.956786, DOI 10.1145/956750.956786]
[35]   Support Vector Machines (SVMs) versus Multilayer Perception (MLP) in data classification [J].
Zanaty, E. A. .
EGYPTIAN INFORMATICS JOURNAL, 2012, 13 (03) :177-183
[36]  
Zekic-Susac M., 2012, CROATIAN OPERATIONAL, V4, P306
[37]  
Zekic-Susac M, 2010, CROAT OPER RES REV, V1, P62