Big Data with Decision Tree Induction

被引:2
|
作者
Sabah, Shabnam [1 ]
Anwar, Sara Zumerrah Binte [1 ]
Afroze, Sadia [1 ]
Azad, Md. Abulkalam [1 ]
Shatabda, Swakkhar [1 ]
Farid, Dewan Md. [1 ]
机构
[1] United Int Univ, Dept Comp Sci & Engn, Madani Ave, Dhaka 1212, Bangladesh
来源
2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA) | 2019年
关键词
Big Data; Classification; Decision Tree; RainForest; Tree Merging; FRAMEWORK;
D O I
10.1109/skima47702.2019.8982419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data mining is one of the major challenging research issues in the field of machine learning for data mining applications in this present digital era. Big data consists of 3V's: (1) volume - massive amount of data/ too many bytes, (2) velocity - high speed streaming data/ too high a rate, and (3) variety - data are coming from different sources/ too many sources. Collecting and managing real-life big data is a difficult task, as big data is so big that we cannot keep all the data together in a single machine. Therefore, we need advanced relational database management systems with parallel computing to deal with big data. Knowledge mining from big data employing traditional machine learning and data mining techniques is a big issue and attract computational intelligent researcher in this area. In this paper, we have used the decision tree (DT) induction method for mining big data. Decision tree induction is one of the most preferable and well-known supervised learning technique, which is a top-down recursive divide and conquer algorithm and require little prior knowledge for constructing a classifier. The traditional DT algorithms like Iterative Dichotomiser 3 (ID3), C4.5 (a successor of ID3 algorithm), Classification and Regression Trees (CART) are generally built for mining relatively small datasets. So, we need a more scalable decision tree learning approach for mining big data. In this paper, we have engendered several trees employing two scalable decision tree algorithms: RainForest Tree and Bootstrapped Optimistic Algorithm for Tree construction (BOAT) using seven benchmark datasets from Keel Repository and UCI Machine Learning repository. We have compared the performance of RainForest and BOAT algorithms. Also, we have proposed a decision tree merging approach, as decision tree merging is a very complex and challenging task.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] INTERPRETABLE DECISION-TREE INDUCTION IN A BIG DATA PARALLEL FRAMEWORK
    Weinberg, Abraham Itzhak
    Last, Mark
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 27 (04) : 737 - 748
  • [2] Application of big data analysis with decision tree for the foot disorder
    Jung-Kyu Choi
    Keun-Hwan Jeon
    Yonggwan Won
    Jung-Ja Kim
    Cluster Computing, 2015, 18 : 1399 - 1404
  • [3] Data abstractions for decision tree induction
    Kudoh, Y
    Haraguchi, M
    Okubo, Y
    THEORETICAL COMPUTER SCIENCE, 2003, 292 (02) : 387 - 416
  • [4] Application of big data analysis with decision tree for the foot disorder
    Choi, Jung-Kyu
    Jeon, Keun-Hwan
    Won, Yonggwan
    Kim, Jung-Ja
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1399 - 1404
  • [5] Self-Tuning Parameters for Decision Tree Algorithm Based on Big Data Analytics
    Hafez, Manar Mohamed
    Elfakharany, Essam Eldin F.
    Abohany, Amr A.
    Thabet, Mostafa
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (01): : 943 - 958
  • [6] A decision tree classifier for credit assessment problems in big data environments
    Chern, Ching-Chin
    Lei, Weng-U
    Huang, Kwei-Long
    Chen, Shu-Yi
    INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT, 2021, 19 (01) : 363 - 386
  • [7] A decision tree classifier for credit assessment problems in big data environments
    Ching-Chin Chern
    Weng-U Lei
    Kwei-Long Huang
    Shu-Yi Chen
    Information Systems and e-Business Management, 2021, 19 : 363 - 386
  • [8] A Distributed Decision Tree Algorithm and Its Implementation on Big Data Platforms
    Chen, Jingxiang
    Wang, Tao
    Abbey, Ralph
    Pingenot, Joseph
    PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 752 - 761
  • [9] Big Data Decision Tree Algorithm Based on Equal-arrival Privacy Budget Allocation
    Shang T.
    Zhao Z.
    Shu W.
    Liu J.
    Gongcheng Kexue Yu Jishu/Advanced Engineering Sciences, 2019, 51 (02): : 130 - 136
  • [10] Reusable components in decision tree induction algorithms
    Suknovic, Milija
    Delibasic, Boris
    Jovanovic, Milos
    Vukicevic, Milan
    Becejski-Vujaklija, Dragana
    Obradovic, Zoran
    COMPUTATIONAL STATISTICS, 2012, 27 (01) : 127 - 148