UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM

被引:0
作者
Zheng, Xin [1 ]
Huang, Li [2 ]
机构
[1] Jiangxi Univ Technol, Artificial Intelligence Dept, 115 Ziyang Ave, Nanchang 330098, Peoples R China
[2] Jiangxi Univ Technol, Informat Engn Coll, 115 Ziyang Ave, Nanchang 330098, Peoples R China
来源
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2024年 / 20卷 / 02期
关键词
Improved RF algorithm; Unbalanced data; Classification recognition; DIAGNOSIS;
D O I
10.24507/ijicic.20.02.575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data analytics has developed rapidly in recent years and data mining has been a positive driver for development in all areas, but data in many areas is grossly unbalanced, and there are still many limitations to current research on classifying big data. To solve this problem, the study uses the K -means algorithm based on class distinction to approximately reduce the dimensionality of the data, and the untracked Kalman filter (UKF) algorithm with an adaptive traceless Kalman filter (Sage-Husa) to reduce the noise of the data. The noise -reduced and dimension -reduced data were obtained to improve the random forest algorithm (K-U-S-H-RF). However, during the study of classifying low -dimensional unbalanced data using K-S-H-RF, it was found that the random forest algorithm did not take account of the actual step-by-step of the data set and was not effective in classifying the data. For this reason, the study introduced cost sensitivity, cost error calculation for decision trees as well as voting. Random forest is parallelized with MapReduce idea to achieve optimum of K-S-H-RF. Then the study constructs an imbalanced big data classification model based on improved random forests. The model can effectively classify unbalanced big data and provide a new path for big data application in more fields, which has a positive effect on the development of the big data era.
引用
收藏
页码:575 / 590
页数:16
相关论文
共 22 条
  • [1] The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling
    Ao, Yile
    Li, Hongqi
    Zhu, Liping
    Ali, Sikandar
    Yang, Zhongguo
    [J]. JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2019, 174 : 776 - 789
  • [2] Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor
    Awad, Fouad H.
    Hamad, Murtadha M.
    [J]. ELECTRONICS, 2022, 11 (06)
  • [3] Human activity recognition from smart watch sensor data using a hybrid of principal component analysis and random forest algorithm
    Balli, Serkan
    Sagbas, Ensar Arif
    Peker, Musa
    [J]. MEASUREMENT & CONTROL, 2019, 52 (1-2) : 37 - 45
  • [4] Large group activity security risk assessment and risk early warning based on random forest algorithm
    Chen, Yanyu
    Zheng, Wenzhe
    Li, Wenbo
    Huang, Yimiao
    [J]. PATTERN RECOGNITION LETTERS, 2021, 144 : 1 - 5
  • [5] Study of Financial Warning Ensemble Model for Listed Companies Based on Unbalanced Classification Perspective
    Cong, Wei
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2020, 16 (01) : 32 - 48
  • [6] Sentiment Analysis of Social Media Twitter with Case of Anti-LGBT Campaign in Indonesia using Naive Bayes, Decision Tree, and Random Forest Algorithm
    Fitri, Veny Amilia
    Andreswari, Rachmadita
    Hasibuan, Muhammad Azani
    [J]. FIFTH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE, 2019, 161 : 765 - 772
  • [7] Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling
    Georganos, Stefanos
    Grippa, Tais
    Gadiaga, Assane Niang
    Linard, Catherine
    Lennert, Moritz
    Vanhuysse, Sabine
    Mboga, Nicholus
    Wolff, Eleonore
    Kalogirou, Stamatis
    [J]. GEOCARTO INTERNATIONAL, 2021, 36 (02) : 121 - 136
  • [8] Guha A., 2019, IAES International Journal of Artificial Intelligence, V8, P168, DOI DOI 10.11591/IJAI.V8.I2
  • [9] Online estimation of SOH for lithium-ion battery based on SSA-Elman neural network
    Guo, Yu
    Yang, Dongfang
    Zhang, Yang
    Wang, Licheng
    Wang, Kai
    [J]. PROTECTION AND CONTROL OF MODERN POWER SYSTEMS, 2022, 7 (01)
  • [10] Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis
    Han, Weihong
    Huang, Zizhong
    Li, Shudong
    Jia, Yan
    [J]. JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (02)