The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform

被引:0
|
作者
Li, Chong [1 ]
机构
[1] Chongqing Vocat Inst Engn, Informat Engn Sch, Chongqing 402260, Peoples R China
关键词
MAPREDUCE;
D O I
10.3303/CET1651065
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly.
引用
收藏
页码:385 / 390
页数:6
相关论文
共 50 条
  • [1] Residential Electricity Classification Method Based On Cloud Computing Platform and Random Forest
    Li, Ming
    Fang, Zhong
    Cao, Wanwan
    Ma, Yong
    Wu, Shang
    Guo, Yang
    Xue, Yu
    Mansour, Romany F.
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 38 (01): : 39 - 46
  • [2] Laplacian-Weighted Random Forest for High-Dimensional Data Classification
    Liang, Jianheng
    Huang, Dong
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 748 - 753
  • [3] Bayesian weighted random forest for classification of high-dimensional genomics data
    Olaniran, Oyebayo Ridwan
    Abdullah, Mohd Asrul A.
    KUWAIT JOURNAL OF SCIENCE, 2023, 50 (04) : 477 - 484
  • [4] Application of Hadoop-Based Cloud Computing in Teaching Platform Research
    Liu, Chang
    JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP05)
  • [5] Application of Hadoop-Based Cloud Computing in Teaching Platform Research
    Liu, Chang
    Journal of Interconnection Networks, 2022, 22
  • [6] Classification Application Based on Mutual Information and Random Forest Method for High Dimensional Data
    Kong, Qingqing
    Gong, Huili
    Ding, Xiangqian
    Hou, Ruichun
    2017 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2017), VOL 1, 2017, : 171 - 174
  • [7] Novel Application of DaaS and Hadoop Technology in Big Data Cloud Computing Platform
    Xu, Hongsheng
    Fan, Ganglong
    Li, Ke
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON MECHATRONICS, COMPUTER AND EDUCATION INFORMATIONIZATION (MCEI 2017), 2017, 75 : 373 - 377
  • [8] Genetic Programming Based on Granular Computing for Classification with High-Dimensional Data
    Pei, Wenbin
    Xue, Bing
    Shang, Lin
    Zhang, Mengjie
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 643 - 655
  • [9] The Visualization of E-commerce High-dimensional Data Based on Random Forest
    Zhu Xianwen
    Yin Hongtan
    AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (01): : 987 - 991
  • [10] The study of cloud computing experimental platform based on the Hadoop
    Sang, Jinge
    Yu, Haicun
    Yu, Guoli
    Li, Feng
    INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 : 1251 - 1257