A Structural Sampling Technique for Better Decision Trees

被引:0
作者
Sug, Hyontai [1 ]
机构
[1] Dongseo Univ, Div Comp & Informat Engn, Pusan, South Korea
来源
2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS | 2009年
关键词
decision trees; sampling; CART; C4.5;
D O I
10.1109/ACIIDS.2009.24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since data mining problems contain a large amount of data, sampling is a necessity for the success of the task. Decision trees have been developed for prediction, and finding decision trees with smaller error rates has been a major task for their success. This paper suggests a structural sampling technique that is based on a generated decision tree, where the tree is generated based on fast and dirty tree generation algorithm. Experiments with several sample sizes and representative decision tree algorithms showed that the method is more effective with respect to decision tree size and error rate than conventional random sampling method especially for small sample size.
引用
收藏
页码:24 / 27
页数:4
相关论文
共 17 条
[1]  
[Anonymous], 2006, Introduction to Data Mining
[2]  
[Anonymous], 1993, C4.5: Programs for machine learning
[3]  
Breiman L., 1984, BIOMETRICS, V40, P874, DOI 10.1201/9781315139470
[4]  
Breiman L., RANDOM FORESTS
[5]  
CATLETT J, 1991, THESIS U SYDNEY AUST
[6]   Iterative optimization and simplification of hierarchical clusterings [J].
Fisher, D .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :147-179
[7]  
Gehrke J., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P416
[8]   Discretization: An enabling technique [J].
Liu, H ;
Hussain, F ;
Tan, CL ;
Dash, M .
DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (04) :393-423
[9]  
Machova K., 2006, ACTA POLYTECH HUNG, V3, P121
[10]  
Mehta M., 1996, SLIQ FAST SCALABLE C