Analytical Comparison between the Information Gain and Gini Index using Historical Geographical Data

被引:0
作者
Zaman, Majid [1 ]
Kaul, Sameer [2 ]
Ahmed, Muheet [2 ]
机构
[1] Univ Kashmir, Directorate IT & SS, Srinagar, India
[2] Univ Kashmir, Dept Comp Sci, Srinagar, India
关键词
Geographical data mining; information gain; Gini index; machine learning; decision tree;
D O I
10.14569/IJACSA.2020.0110557
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The historical geographical data of Kashmir province is spread across two disparate files having attributes of Maximum Temperature, Minimum Temperature, Humidity measured at 12 A.M., Humidity measured at 3 P.M., rainfall besides auxiliary parameters like date, year etc. The parameters Maximum Temperature, Minimum Temperature, Humidity measured at 12 A.M., Humidity measured at 3 P.M. are continuous in nature and here, in this study, we applied Information Gain and Gini Index on these attributes to convert continuous data into discrete values, their after we compare and evaluate the generated results. Of the four attributes, two have same results for Information Gain and Gini Index; one attribute has overlapping results while as only one attribute has conflicting results for Information Gain and Gini Index. Subsequently, continuous valued attributes are converted into discrete values using Gini index. Irrelevant attributes are not considered and auxiliary attributes are labeled accordingly. Consequently, the data set is ready for the application of machine learning (decision tree) algorithms.
引用
收藏
页码:429 / 440
页数:12
相关论文
共 14 条
  • [1] Ashraf Mudasir, 2018, Procedia Computer Science, V132, P1021, DOI 10.1016/j.procs.2018.05.018
  • [2] Ashraf Mudasir, 2018, 2018 8 INT C CLOUD C
  • [3] Han J., Data Mining: Concepts and Techniques
  • [4] Han J.W., 2007, data mining concepts and technology
  • [5] Kulkarni Vrushali Y., 2012, P 2 INT C SOFT COMP, pNew
  • [6] LI Shoubang, 2018, J XIAN SHIYOU U NATU, V33, P120
  • [7] Mirza Shuja, 2016, INT J COMPUTER SCI I, V14
  • [8] Mirza Shuja., 2018, Int J Appl Eng Res, V13, P9277
  • [9] Muharram M.A., 2004, LECT NOTES COMPUTER, V3003
  • [10] Improved use of continuous attributes in C4.5
    Quinlan, JR
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 77 - 90