Fast Decision Tree Algorithm

被引:9
作者
Purdila, Vasile [1 ]
Pentiuc, Stefan-Gheorghe [1 ]
机构
[1] Stefan Cel Mare Univ Suceava, Suceava 720229, Romania
关键词
algorithm; chi-merge; classification; data compression; decision tree; pruning;
D O I
10.4316/AECE.2014.01010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a growing interest nowadays to process large amounts of data using the well-known decision-tree learning algorithms. Building a decision tree as fast as possible against a large dataset without substantial decrease in accuracy and using as little memory as possible is essential. In this paper we present an improved C4.5 algorithm that uses a compression mechanism to store the training and test data in memory. We also present a very fast tree pruning algorithm. Our experiments show that presented algorithms perform better than C5.0 in terms of speed and classification accuracy in most cases at the expense of tree size - the resulting trees are larger than the ones produced by C5.0. The data compression and pruning algorithms can be easily parallelized in order to achieve further speedup.
引用
收藏
页码:65 / 68
页数:4
相关论文
共 22 条
  • [1] Breiman L., 1999, Classification and Regression Trees
  • [2] Chakrabarti Soumen., 2009, DATA MINING KNOW IT
  • [3] Pruning Decision Tree Using Genetic Algorithms
    Chen, Jie
    Wang, Xizhao
    Zhai, Junhai
    [J]. 2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL III, PROCEEDINGS, 2009, : 244 - 248
  • [4] A decision-theoretic generalization of on-line learning and an application to boosting
    Freund, Y
    Schapire, RE
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
  • [5] Huaping Guo, 2011, 2011 IEEE International Conference on Computer Science and Automation Engineering (CSAE), P71, DOI 10.1109/CSAE.2011.5952636
  • [6] Huber P., 1997, LARGE HUGE STAT REAC, P304
  • [7] Jensen D., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P195
  • [8] Jie Ouyang, 2008, 2008 IEEE International Conference on Data Mining Workshops, P477, DOI 10.1109/ICDMW.2008.37
  • [9] Kearns M., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P269
  • [10] KERBER R, 1992, AAAI-92 PROCEEDINGS : TENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, P123