Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments

被引:0
作者
Wang, Huan [1 ]
机构
[1] Southwest Forestry Univ, Coll Big Data & Intelligence Engn, Kunming 650224, Yunnan, Peoples R China
关键词
-Random forest; SVM; machine learning; big data; feature selection; best-first search; rough set theory;
D O I
10.14569/IJACSA.2023.0141054
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
high-dimensional big data presents substantial challenges for Machine Learning (ML) algorithms, mainly due to the curse of dimensionality that leads to computational inefficiencies and increased risk of overfitting. Various dimensionality reduction and Feature Selection (FS) techniques have been developed to alleviate these challenges. Random Forest (RF), a widely-used Ensemble Learning Method (ELM), is recognized for its high accuracy and robustness, including its lesser-known capability for effective FS. While specialized RF models are designed for FS, they often struggle with computational efficiency on large datasets. Addressing these challenges, this study proposes a novel Feature Selection Model (FSM) integrated with data reduction techniques, termed Dynamic Correlated Regularized Random Forest (DCRRF). The architecture operates in four phases: Preprocessing, Feature Reduction (FR) using Best-First Search with Rough Set Theory (BFS-RST), FS through DCRRF, and feature efficacy assessment using a Support Vector Machine (SVM) classifier. Benchmarked against four gene expression datasets, the proposed model outperforms existing RF-based methods in computational efficiency and classification accuracy. This study introduces a robust and efficient approach to feature selection in high-dimensional big-data scenarios.
引用
收藏
页码:505 / 518
页数:14
相关论文
共 69 条
[1]   A wireless IOT system towards gait detection technique using FSR sensor and wearable IOT devices [J].
Achanta, Sampath Dakshina Murthy ;
Karthikeyan, T. ;
Kanna, Vinoth R. .
INTERNATIONAL JOURNAL OF INTELLIGENT UNMANNED SYSTEMS, 2019, 8 (01) :43-54
[2]   Grey Wolf Shuffled Shepherd Optimization Algorithm-Based Hybrid Deep Learning Classifier for Big Data Classification [J].
Banchhor, Chitrakant ;
Srinivasu, N. .
INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH, 2022, 13 (01)
[3]   A comprehensive study of data intelligence in the context of big data analytics [J].
Banchhor, Chitrakant ;
Srinivasu, N. .
WEB INTELLIGENCE, 2022, 20 (01) :53-66
[4]   Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data [J].
Banchhor, Chitrakant ;
Srinivasu, N. .
EVOLUTIONARY INTELLIGENCE, 2022, 15 (02) :1037-1050
[5]   Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework [J].
Banchhor, Chitrakant ;
Srinivasu, N. .
JOURNAL OF BIG DATA, 2021, 8 (01)
[6]   Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification [J].
Banchhor, Chitrakant ;
Srinivasu, N. .
DATA & KNOWLEDGE ENGINEERING, 2020, 127
[7]   FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification [J].
Banchhor, Chitrakant ;
Srinivasu, N. .
JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) :994-1006
[8]   Low rate DDoS mitigation using real-time multi threshold traffic monitoring system [J].
Baskar, M. ;
Ramkumar, J. ;
Karthikeyan, C. ;
Anbarasu, V. ;
Balaji, A. ;
Arulananth, T. S. .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021,
[9]   A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values [J].
Behnamian, Amir ;
Millard, Koreen ;
Banks, Sarah N. ;
White, Lori ;
Richardson, Murray ;
Pasher, Jon .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (11) :1988-1992
[10]   A Holistic Study on the Use of Blockchain Technology in CPS and IoT Architectures Maintaining the CIA Triad in Data Communication [J].
Bhattacharjya, Aniruddha .
INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2022, 32 (03) :403-413