Automated Data Pre-processing via Meta-learning

被引:15
作者
Bilalli, Besim [1 ]
Abello, Alberto [1 ]
Aluja-Banet, Tomas [1 ]
Wrembel, Robert [2 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Poznan Univ Tech, Poznan, Poland
来源
MODEL AND DATA ENGINEERING | 2016年 / 9893卷
关键词
D O I
10.1007/978-3-319-45547-1_16
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and non-experienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from meta-learning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.
引用
收藏
页码:194 / 208
页数:15
相关论文
共 17 条
[1]  
[Anonymous], 1999, Technometrics, DOI DOI 10.2307/1269742
[2]  
[Anonymous], 2003, Exploratory Data Mining and Data Cleaning
[3]  
[Anonymous], 2001, INT J AI TOOLS
[4]  
[Anonymous], 2013, KDD
[5]  
Bilalli B., 2016, IOTBD
[6]  
Charest M., 2008, IDA
[7]   The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing [J].
Crone, Sven F. ;
Lessmann, Stefan ;
Stahlbock, Robert .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (03) :781-800
[8]  
Fayyad U., 1996, AI Magazine
[9]  
Guazzelli A, 2009, R J, V1, P60
[10]  
Hall M., 2009, SIGKDD EXPLORATIONS, V11, P10, DOI [DOI 10.1145/1656274.1656278, 10.1145/1656274.1656278]