Discretization of data using Boolean transformations and information theory based evaluation criteria

被引:7
作者
Jankowski, C. [1 ]
Reda, D. [1 ]
Mankowski, M. [2 ]
Borowik, G. [1 ]
机构
[1] Warsaw Univ Technol, Inst Telecommun, 15-19 Nowowiejska St, PL-00665 Warsaw, Poland
[2] Warsaw Univ Technol, Inst Radioelect & Multimedia Technol, PL-00665 Warsaw, Poland
关键词
machine learning; discretization; discernibility function; logic minimization; information theory; entropy;
D O I
10.1515/bpasts-2015-0105
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Discretization is one of the most important parts of decision table preprocessing. Transforming continuous values of attributes into discrete intervals influences further analysis using data mining methods. In particular, the accuracy of generated predictions is highly dependent on the quality of discretization. The paper contains a description of three new heuristic algorithms for discretization of numeric data, based on Boolean reasoning. Additionally, an entropy-based evaluation of discretization is introduced to compare the results of the proposed algorithms with the results of leading university software for data analysis. Considering the discretization as a data compression method, the average compression ratio achieved for databases examined in the paper is 8.02 while maintaining the consistency of databases at 100%.
引用
收藏
页码:923 / 932
页数:10
相关论文
共 39 条
[1]  
[Anonymous], 2014, INT J COMPUT SCI INF
[2]  
[Anonymous], ARTIFICIAL INTELLIGE
[3]   A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier [J].
Augasta, M. Gethsiyal ;
Kathirvalavakumar, T. .
APPLIED SOFT COMPUTING, 2012, 12 (02) :619-625
[4]  
Bache K., 2013, UCI Machine Learning Repository
[5]   Fast Algorithm of Attribute Reduction Based on the Complementation of Boolean Function [J].
Borowik, Grzegorz ;
Luba, Tadeusz .
ADVANCED METHODS AND APPLICATIONS IN COMPUTATIONAL INTELLIGENCE, 2014, 6 :25-41
[6]  
Borowik G, 2013, LECT NOTES COMPUT SC, V8112, P218, DOI 10.1007/978-3-642-53862-9_28
[7]  
Brayton R. K., 1984, KLUWER INT SERIES EN, V2, DOI [10.1007/978-1-4613-2821-6, DOI 10.1007/978-1-4613-2821-6]
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
Chmielewski M.R., 1996, International Journal of Approximate Reasoning, P294
[10]   Automatic classification of patients with Alzheimer's disease from structural MRI: A comparison of ten methods using the ADNI database [J].
Cuingnet, Remi ;
Gerardin, Emilie ;
Tessieras, Jerome ;
Auzias, Guillaume ;
Lehericy, Stephane ;
Habert, Marie-Odile ;
Chupin, Marie ;
Benali, Habib ;
Colliot, Olivier .
NEUROIMAGE, 2011, 56 (02) :766-781