A Framework for Improvement a Decision Tree Learning Algorithm Using K-NN

被引:0
作者
Kurematsu, Masaki [1 ]
Hakura, Jun [1 ]
Fujita, Hamido [1 ]
机构
[1] Iwate Prefectual Univ, Fac Software & Informat, Takizawa, Iwate, Japan
来源
NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES | 2014年 / 265卷
关键词
Decision Tree Learning Algorithm; K-NN; ID3;
D O I
10.3233/978-1-61499-434-3-206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we proposed a modified decision tree learning algorithm. In order to improve the traditional decision tree learning algorithm, we modified a predict phase though exists approached modified a learning phase. Our proposed approach makes a decision tree by a traditional decision tree learning algorithm and predicts new data items' class label by K-NN. The traditional decision tree learning algorithm predicts a class label based on the ratio of class labels in a leaf node. When it is not easy to classify data set according to class labels, leaf nodes includes a lot of data items and class labels. It causes to decrease the accuracy rate. However, it is difficult to prepare good training data set. So we used K-NN to predict a class label from data items in a leaf node. In order to evaluate our approach, we did an experiment using a part of open data sets from UCL learning repository. We compared our approach to ID3 which is one of traditional decision tree learning algorithms and K-NN in this experiment. Experimental result shows our approach is better than ID3 when the leaf nodes include a lot of data items. When the leaf nodes include some data items, our approach can perform like as ID3. So we can say that our approach is useful to modify a decision tree learning algorithm. We don't change a learning process so that our approach doesn't change the readability of a decision tree. In addition to, our approach is better than K-NN. We think that a decision tree works for K-NN as data cleaning. It says that our approach is useful for K-NN. Though we can show the advantage of our approach according to the experiment, there are some data items we can not predict correctly. In future, we have to evaluate experimental results and process in detail. We have to ascertain the cause of error. And we consider how to modify our approach to correct errors. It is likely that normalization is one of useful method. In addition to, we have to evaluate our new approach using some open data sets.
引用
收藏
页码:206 / 212
页数:7
相关论文
共 14 条
[1]  
AMANUMA S, 2012, 11 INT C SOFTW METH, V246, P351
[2]  
[Anonymous], 1961, PRINCIPLES NEURODYNA
[3]  
Bach MP, 2008, MED GLAS, V5, P57
[4]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Cortes C., 1995, Machine Learning, V297, P273, DOI [DOI 10.1007/BF00994018, DOI 10.1023/A:1022627411411]
[7]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]   K-Means+ID3: A novel method for supervised anomaly detection by cascading k-Means clustering and ID3 decision tree learning methods [J].
Gaddam, Shekhar R. ;
Phoha, Vir V. ;
Balagani, Kiran S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (03) :345-354
[9]  
Kurematsu M, 2010, INT CONF APPL COMPUT, P346
[10]  
Kurematu Masaki, 2008, WSEAS T INFORM SCI A, V3, P246