A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

被引:9
作者
Zekic-Susac, Marijana [1 ]
Pfeifer, Sanja [1 ]
Sarlija, Natasa [1 ]
机构
[1] Univ Josip Juraj Strossmayer Osijek, Fac Econ, Osijek, Croatia
来源
BUSINESS SYSTEMS RESEARCH JOURNAL | 2014年 / 5卷 / 03期
关键词
machine learning; support vector machines; artificial neural networks; CART classification trees; k-nearest neighbour; large-dimensional data; cross-validation;
D O I
10.2478/bsrj-2014-0021
中图分类号
F [经济];
学科分类号
02 ;
摘要
Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.
引用
收藏
页码:82 / 96
页数:15
相关论文
共 37 条
[1]  
[Anonymous], 1995, NEURAL NETWORKS PATT
[2]   Data mining with decision trees and decision rules [J].
Apte, C ;
Weiss, S .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1997, 13 (2-3) :197-210
[3]   Generalization performance of support vector machines and neural networks in runoff modeling [J].
Behzad, Mohsen ;
Asghari, Keyvan ;
Eazi, Morten ;
Palhang, Maziar .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7624-7629
[4]  
Bensic M., 2005, INTELLIGENT SYSTEMS, V13, P133
[5]   Comparison of binary discrimination methods for high dimension low sample size data [J].
Bolivar-Cime, A. ;
Marron, J. S. .
JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 115 :108-121
[6]   A COMPARISON OF DECISION TREE CLASSIFIERS WITH BACKPROPAGATION NEURAL NETWORKS FOR MULTIMODAL CLASSIFICATION PROBLEMS [J].
BROWN, DE ;
CORRUBLE, V ;
PITTARD, CL .
PATTERN RECOGNITION, 1993, 26 (06) :953-961
[7]   Prior family business exposure as intergenerational influence and entrepreneurial intent: A Theory of Planned Behavior approach [J].
Carr, Jon C. ;
Sequeira, Jennifer M. .
JOURNAL OF BUSINESS RESEARCH, 2007, 60 (10) :1090-1098
[8]  
Dai YH, 2003, SIAM J OPTIMIZ, V13, P693
[9]   The Behavioral Impact of Entrepreneur Identity Aspiration and Prior Entrepreneurial Experience [J].
Farmer, Steven M. ;
Yao, Xin ;
Kung-Mcintyre, Kate .
ENTREPRENEURSHIP THEORY AND PRACTICE, 2011, 35 (02) :245-273
[10]  
Haykin S., 1999, NEURAL NETWORKS COMP