A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms

被引:5
|
作者
Chen, Jingxiang [1 ,3 ]
Wang, Tao [1 ]
Abbey, Ralph [1 ]
Pingenot, Joseph [2 ]
机构
[1] SAS Inst Inc, Cary, NC 27513 USA
[2] Google Inc, Pittsburgh, PA 15206 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
来源
PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016) | 2016年
关键词
Big data; CHAID; Data Mining; Decision Tree; Distributed Algorithm; KS-Tree;
D O I
10.1109/DSAA.2016.64
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
ecision tree algorithms are very popular in the field of data mining. This paper proposes a distributed decision tree algorithm and shows examples of its implementation on big data platforms. The major contribution of this paper is the novel KS-Tree algorithm which builds a decision tree in a distributed environment. KS-Tree is applied to some real-world data mining problems and compared with state-of-the-art decision tree techniques that are implemented in R and Apache Spark. The results show that KS-Tree can achieve better results, especially with large data sets. Furthermore, we demonstrate that KS-Tree can be applied to various data mining tasks, such as variable selection.
引用
收藏
页码:752 / 761
页数:10
相关论文
共 50 条
  • [1] Implementation of Data Preprocessing Techniques on Distributed Big Data Platforms
    Celik, Oguz
    Hasanbasoglu, Muruvvet
    Aktas, Mehmet S.
    Kalipsiz, Oya
    Kanli, Alper Nebi
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 73 - 78
  • [2] Big Data with Decision Tree Induction
    Sabah, Shabnam
    Anwar, Sara Zumerrah Binte
    Afroze, Sadia
    Azad, Md. Abulkalam
    Shatabda, Swakkhar
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [3] Research on Concept Drift Detection for Decision Tree Algorithm in the Stream of Big Data
    Liu, Shangdong
    Lu, Lili
    Zhang, Yongpan
    Xin, Tong
    Ji, Yimu
    Wang, Ruchuan
    PARALLEL ARCHITECTURE, ALGORITHM AND PROGRAMMING, PAAP 2017, 2017, 729 : 237 - 246
  • [4] Application of big data analysis with decision tree for the foot disorder
    Jung-Kyu Choi
    Keun-Hwan Jeon
    Yonggwan Won
    Jung-Ja Kim
    Cluster Computing, 2015, 18 : 1399 - 1404
  • [5] Application of big data analysis with decision tree for the foot disorder
    Choi, Jung-Kyu
    Jeon, Keun-Hwan
    Won, Yonggwan
    Kim, Jung-Ja
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1399 - 1404
  • [6] Self-Tuning Parameters for Decision Tree Algorithm Based on Big Data Analytics
    Hafez, Manar Mohamed
    Elfakharany, Essam Eldin F.
    Abohany, Amr A.
    Thabet, Mostafa
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (01): : 943 - 958
  • [7] The Impact of Distributed Data in Big Data Platforms on Organizations
    Koren, Oded
    Binyaminov, Matan
    Perel, Nir
    PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2018, VOL 2, 2019, 881 : 1024 - 1036
  • [8] A decision tree classifier for credit assessment problems in big data environments
    Chern, Ching-Chin
    Lei, Weng-U
    Huang, Kwei-Long
    Chen, Shu-Yi
    INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT, 2021, 19 (01) : 363 - 386
  • [9] A decision tree classifier for credit assessment problems in big data environments
    Ching-Chin Chern
    Weng-U Lei
    Kwei-Long Huang
    Shu-Yi Chen
    Information Systems and e-Business Management, 2021, 19 : 363 - 386
  • [10] Big Data Decision Tree Algorithm Based on Equal-arrival Privacy Budget Allocation
    Shang T.
    Zhao Z.
    Shu W.
    Liu J.
    Gongcheng Kexue Yu Jishu/Advanced Engineering Sciences, 2019, 51 (02): : 130 - 136