Comparison of Decision Tree Classification Methods and Gradient Boosted Trees

被引:1
作者
Dikananda, Arif Rinaldi [1 ]
Jumini, Sri [2 ]
Tarihoran, Nafan [3 ]
Christinawati, Santy [4 ]
Trimastuti, Wahyu [4 ]
Rahim, Robbi [5 ]
机构
[1] ISTMIK IKMI, Cirebon, Indonesia
[2] Univ Sains Al Quran Indonesia, Wonosobo, Indonesia
[3] Univ Islam Negeri Sutan Maulana Hasanuddi, Serang, Indonesia
[4] Politekn Piksi Ganesha Bandung, Bandung, Indonesia
[5] Sekolah Tinggi Ilmu Manajemen Sukma, Medan, Indonesia
来源
TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS | 2022年 / 11卷 / 01期
关键词
Comparison; Data mining; Classification; C4.5; Random Forest; Accuracy; RANDOM FOREST; ALGORITHM;
D O I
10.18421/TEM111-39
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of this research is to analyze the C4.5 and Random Forest algorithms for classification. The two methods were compared to see which one in the classification process was more accurate. The case is the success of university students at one of the private universities. Data is obtained from the https://osf.io/jk2ac data set. The attributes used were gender, student, average evaluation (NEM), reading session, school origin, and presence as input and success as a result (label). The process of analysis uses Rapid Miner software with the same test parameters (k-folds = 2, 3, 4, 5) with the same type of sample (stratified sample, linear sample, shuffled sampling). The first result shows that the sample type test k-fold (stratified sampling) achieved an average accuracy of 55.76 percent (C4,5) and 5618 percent (Random Forest). The second result showed that the k-fold (linear sampling) sample test achieved an average precision of 58.06 percent (C4.5) and 6506 percent. (Random Forest). The third result shows that the k-fold test with the sampling type has averaged 58.68 per cent (C4,5) and 60,76 per cent (shuffled sampling) precision (Random Forest). From the three test results, in the case of student success at a private university, the Random Forest method is better than C4.5.
引用
收藏
页码:316 / 322
页数:7
相关论文
共 26 条
  • [1] DATA MINING DATA MINING CONCEPTS AND TECHNIQUES
    Agarwal, Shivam
    [J]. 2013 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND RESEARCH ADVANCEMENT (ICMIRA 2013), 2013, : 203 - 207
  • [2] On learning algorithm selection for classification
    Ali, S
    Smith, KA
    [J]. APPLIED SOFT COMPUTING, 2006, 6 (02) : 119 - 138
  • [3] Bhosle Nilesh, 2020, International Journal of Intelligent Information and Database Systems, V13, P72
  • [4] Feature Selection and Instance Selection from Clinical Datasets Using Co-operative Co-evolution and Classification Using Random Forest
    Christo, V. R. Elgin
    Nehemiah, H. Khanna
    Brighty, J.
    Kannan, Arputharaj
    [J]. IETE JOURNAL OF RESEARCH, 2022, 68 (04) : 2508 - 2521
  • [5] Cinaroglu S., 2016, INT J COMPUTER APPL, V138, P37, DOI DOI 10.5120/IJCA2016908704
  • [6] Random forests for classification in ecology
    Cutler, D. Richard
    Edwards, Thomas C., Jr.
    Beard, Karen H.
    Cutler, Adele
    Hess, Kyle T.
    [J]. ECOLOGY, 2007, 88 (11) : 2783 - 2792
  • [7] Elacio A.A., 2020, INT J ADV TRENDS COM, V9, P57, DOI [10.30534/ijatcse/2020/1191.12020, DOI 10.30534/IJATCSE/2020/1191.12020]
  • [8] Esmaily H, 2018, J RES HEALTH SCI, V18
  • [9] Hermanto H., 2019, SINKRON JURNAL DAN P, V3, P266
  • [10] Muhajir M, 2015, INT J ADV INTELLIGEN, V3, P158, DOI [10.26555/ijain.v1i3.50, DOI 10.26555/IJAIN.V1I3.50]