Improvement of ID3 algorithm based on simplified information entropy and coordination degree

被引:3
作者
Wang Y. [1 ]
Li Y. [1 ]
Song Y. [2 ]
Rong X. [1 ]
Zhang S. [3 ]
机构
[1] School of Control Science and Engineering, Shandong University, Jinan
[2] School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai
[3] Department of Electrical Engineering and Information Technology, Shandong University of Science and Technology, Jinan
基金
中国国家自然科学基金;
关键词
Coordination degree; Decision tree; ID3; algorithm; Information entropy;
D O I
10.3390/a10040124
中图分类号
学科分类号
摘要
The decision tree algorithm is a core technology in data classification mining, and ID3 (Iterative Dichotomiser 3) algorithm is a famous one, which has achieved good results in the field of classification mining. Nevertheless, there exist some disadvantages of ID3 such as attributes biasing multi-values, high complexity, large scales, etc. In this paper, an improved ID3 algorithm is proposed that combines the simplified information entropy based on different weights with coordination degree in rough set theory. The traditional ID3 algorithm and the proposed one are fairly compared by using three common data samples as well as the decision tree classifiers. It is shown that the proposed algorithm has a better performance in the running time and tree structure, but not in accuracy than the ID3 algorithm, for the first two sample sets, which are small. For the third sample set that is large, the proposed algorithm improves the ID3 algorithm for all of the running time, tree structure and accuracy. The experimental results show that the proposed algorithm is effective and viable. © 2017 by the authors.
引用
收藏
相关论文
共 42 条
[1]  
Kirkos E., Spathis C., Manolopoulos Y., Data Mining techniques for the detection of fraudulent financial statements, Exp. Syst. Appl, 32, pp. 995-1003, (2007)
[2]  
Witten I.H., Frank E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
[3]  
Morgan Kaufmann Publisher: Dordrecht, The Netherlands, 31, pp. 76-77, (2000)
[4]  
Gandhi M., Singh S.N., Predictions in Heart Disease Using Techniques of Data Mining, Proceedings of the International Conference on Futuristic Trends on Computational Analysis and Knowledge Management, pp. 520-525, (2015)
[5]  
Vishnubhotla P.R., Storing Data Mining Clustering Results in a Relational Database for Querying and Reporting, (2004)
[6]  
Hall M.A., Holmes G., Benchmarking Attribute Selection Techniques for Discrete Class Data Mining, IEEE Trans. Knowl. Data Eng, 15, pp. 1437-1447, (2003)
[7]  
Thabtah F., A review of associative classification mining, Knowl. Eng. Rev, 22, pp. 37-65, (2007)
[8]  
Codenotti B., Leoncini M., Parallelism and fast solution of linear systems, Comput. Math. Appl, 19, pp. 1-18, (1990)
[9]  
Huang C.J., Yang D.X., Chuang Y.T., Application of wrapper approach and composite classifier to the stock trend prediction, Exp. Syst. Appl, 34, pp. 2870-2878, (2008)
[10]  
Tsai C.F., Lin Y.C., Yen D.C., Chen Y.M., Predicting stock returns by classifier ensembles, Appl. Comput, 11, pp. 2452-2459, (2011)