Data-driven decision-making in classification algorithm selection

被引:4
作者
Oreski, Dijana [1 ]
Redep, Nina Begicevic [1 ]
机构
[1] Univ Zagreb, Fac Org & Informat, Varazhdin, Croatia
关键词
Data characteristics; datadriven classification; CRISP DM; Decision-making; meta-learning;
D O I
10.1080/12460125.2018.1468168
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
The selection of the appropriate classification algorithm for a given data-set is an important and complex issue, full of research challenges. In this paper, we present a developed meta-analysis-based framework to improve decision-making in the selection of classification algorithms based on data-set characteristics. We study the effectiveness of our proposed framework with 32 data-sets. Three classification algorithms - neural networks, decision trees, and k-nearest neighbours - were trained and applied to data-sets with different characteristics, aiming to review the performance of algorithms in the presence of noise in the data, the interaction between features, as well as a small or a large ratio between the number of instances and the number of features. Our results show that feature noise is the most important predictor of the decision regarding the choice of the classification algorithm, and data-driven classification is found to be useful in this scenario.
引用
收藏
页码:248 / 255
页数:8
相关论文
共 23 条
[1]   On learning algorithm selection for classification [J].
Ali, S ;
Smith, KA .
APPLIED SOFT COMPUTING, 2006, 6 (02) :119-138
[2]   Utilizing various sparsity measures for enhancing accuracy of collaborative recommender systems based on local and global similarities [J].
Anand, Deepa ;
Bharadwaj, Kamal K. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) :5101-5109
[3]  
Azevedo A., 2008, IADIS EUR C DAT MIN
[4]   Domain of competence of XCS classifier system in complexity measurement space [J].
Bernadó-Mansilla, E ;
Ho, TK .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2005, 9 (01) :82-104
[5]  
Brazdil P., 1994, Machine Learning: ECML-94. European Conference on Machine Learning. Proceedings, P83
[6]  
Chen C, 2011, 2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), P384, DOI 10.1109/IRI.2011.6009578
[7]   Similarity of feature selection methods: An empirical study across data intensive classification tasks [J].
Dessi, Nicoletta ;
Pes, Barbara .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (10) :4632-4642
[8]   The KDD process for extracting useful knowledge from volumes of data [J].
Fayyad, U ;
PiatetskyShapiro, G ;
Smyth, P .
COMMUNICATIONS OF THE ACM, 1996, 39 (11) :27-34
[9]   Noise detection and elimination in data preprocessing: Experiments in medical domains [J].
Gamberger, D ;
Lavrac, N ;
Dzeroski, S .
APPLIED ARTIFICIAL INTELLIGENCE, 2000, 14 (02) :205-223
[10]   A comparative assessment of classification methods [J].
Kiang, MY .
DECISION SUPPORT SYSTEMS, 2003, 35 (04) :441-454