Data preprocessing in predictive data mining

被引:91
作者
Alexandropoulos, Stamatios-Aggelos N. [1 ]
Kotsiantis, Sotiris B. [1 ]
Vrahatis, Michael N. [1 ]
机构
[1] Univ Patras, Dept Math, Computat Intelligence Lab, GR-26110 Patras, Greece
关键词
COMBINING INSTANCE SELECTION; MISSING VALUES; DISCRETIZATION METHOD; REDUCTION TECHNIQUES; PROTOTYPE SELECTION; ATTRIBUTE NOISE; IMBALANCED DATA; ALGORITHM; IMPUTATION; CLASSIFICATION;
D O I
10.1017/S026988891800036X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large variety of issues influence the success of data mining on a given problem. Two primary and important issues are the representation and the quality of the dataset. Specifically, if much redundant and unrelated or noisy and unreliable information is presented, then knowledge discovery becomes a very difficult problem. It is well-known that data preparation steps require significant processing time in machine learning tasks. It would be very helpful and quite useful if there were various preprocessing algorithms with the same reliable and effective performance across all datasets, but this is impossible. To this end, we present the most well-known and widely used up-to-date algorithms for each step of data preprocessing in the framework of predictive data mining.
引用
收藏
页数:33
相关论文
共 143 条
[1]   Outlier mining in large high-dimensional data sets [J].
Angiulli, F ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :203-215
[2]   Exploiting domain knowledge to detect outliers [J].
Angiulli, Fabrizio ;
Fassetti, Fabio .
DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (02) :519-568
[3]  
[Anonymous], EVOLVING SYSTEMS
[4]  
[Anonymous], 2012, OUTLIER ANAL
[5]  
[Anonymous], ARTIFICIAL INTELLIGE
[6]  
[Anonymous], 2010, J INTELL INF SYST, DOI DOI 10.1007/s10844-009-0101-z
[7]  
[Anonymous], 2005, INT C KNOWL BAS INT
[8]  
[Anonymous], 2012, INT J ADVANCEMENTS C
[9]  
[Anonymous], P INT C COMM COMP
[10]  
[Anonymous], EVOLUTIONARY COMPUTA