Unsupervised Discretization Method based on Adjustable Intervals

被引:3
作者
Bennasar, Mohamed [1 ]
Setchi, Rossitza [1 ]
Hicks, Yulia [1 ]
机构
[1] Cardiff Univ, Sch Engn, Cardiff CF24 3AA, S Glam, Wales
来源
ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS | 2012年 / 243卷
关键词
unsupervised discretization; supervised discretization; classification accuracy;
D O I
10.3233/978-1-61499-105-2-79
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Discretization is a process applied to transform continuous data into data with discrete attributes. It makes the learning step of many classification algorithms more accurate and faster. Although many efficient supervised discretization methods have been proposed, unsupervised methods such as Equal Width Discretization (EWD) and Equal Frequency Discretization (EFD) are still in use especially with datasets when classification is not available. Each of these algorithms has its drawbacks. To improve the classification accuracy of EWD, a new method based on adjustable intervals is proposed in this paper. The new method is tested using benchmarking datasets from the UCI repository of machine learning databases; the C4.5 classification algorithm is then used to test the classification accuracy. The experimental results show that the method improves the classification accuracy by about 5% compared to the conventional EWD and EFD methods, and is as good as the supervised Entropy Minimization Discretization (EMD) method.
引用
收藏
页码:79 / 87
页数:9
相关论文
共 12 条
  • [1] [Anonymous], 2014, C4. 5: programs for machine learning
  • [2] CATLETT J, 1991, LECT NOTES ARTIF INT, V482, P164, DOI 10.1007/BFb0017012
  • [3] Dougherty J., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P194
  • [4] Fayyad U.M., 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
  • [5] Frank A., 2010, UCI machine learning repository, V213
  • [6] Ismail M., 2003, P 3 INT C HYBR INT S
  • [7] Kotsiantis S., 2006, GESTS International Transactions on Computer Science and Engineering, V32, P47
  • [8] Langley P, 1992, P 1992, P223
  • [9] Li J., 1999, ICSC 99
  • [10] Discretization: An enabling technique
    Liu, H
    Hussain, F
    Tan, CL
    Dash, M
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (04) : 393 - 423