MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification

被引:0
作者
Wei Xu
Vinh Truong Hoang
机构
[1] Xi’an University of Finance and Economics,Business School
[2] Ho Chi Minh City Open University,Faculty of Computer Science
来源
Mobile Networks and Applications | 2021年 / 26卷
关键词
Machine learning; Data classification model; Big data processing; MapReduce;
D O I
暂无
中图分类号
学科分类号
摘要
This paper takes education data mining as the research theme, mine the existing massive education big data, compares the analysis methods of existing data models, and proposes an improved random forest reference model. The information gain of various features is calculated by introducing the feature weighting system, and the evaluation index is used to improve the existing data analysis. The simulation results show that the improved model is highly efficient as compared to the existing models for classification. In order to resolve the performance bottleneck of a single node in multiple data classification tasks in the era of big data, a classification and prediction model of graduates’ large-scale employment data, based on distributed improved RF algorithm, is proposed. The MapReduce distributed computing framework is used to complete the serial writing and deserialization loading of the training model between the local disk and the distributed file system, and realizing the distributed expansion of the large-scale data classification model based on the improved RF model.
引用
收藏
页码:191 / 199
页数:8
相关论文
共 56 条
[1]  
Peña-Ayala A(2014)Educational data mining: a survey and a data mining-based analysis of recent works Expert Syst Appl 41 1432-1462
[2]  
Baker RS(2014)Educational data mining: an advance for intelligent systems in education IEEE Intell Syst 29 78-82
[3]  
Tomasevic N(2020)An overview and comparison of supervised data mining techniques for student exam performance prediction Comput Educ 143 103676-103689
[4]  
Gvozdenovic N(2018)Early segmentation of students according to their academic performance: a predictive modelling approach Decis Support Syst 115 36-51
[5]  
Vranes S(2020)An ensemble prediction model for potential student recommendation using machine learning Symmetry 12 728-745
[6]  
Miguéis VL(2012)Study on personalization recommendation system based on recruitment information Procedia Eng 29 780-784
[7]  
Freitas A(2019)Predicting intentions of students for master programs using a chaos-induced sine cosine-based fuzzy K-nearest neighbor classifier Ieee Access 7 67235-67248
[8]  
Garcia PJ(2018)An improved random forest algorithm for multi class unbalanced data processing in MapReduce environment [J] Microelectronics and computer 35 145-150
[9]  
Silva A(2019)Optimisation analysis of nanocomposite pipes with internal fluid flow under external excitation Int J Hydromechatronics 2 1-15
[10]  
Yan L(2016)Mining educational data to predict student’s academic performance using ensemble methods Int J Database Theory Appl 9 119-136