A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel Programming

被引:0
作者
Natesan P. [1 ]
Sathishkumar V.E. [2 ]
Mathivanan S.K. [3 ]
Venkatasen M. [3 ]
Jayagopal P. [3 ]
Allayear S.M. [4 ]
机构
[1] Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, Tamilnadu, Erode
[2] Department of Industrial Engineering, Hanyang University, Seoul
[3] School of Information Technology and Engineering, Vellore Institute of Technology, TamilNadu, Vellore
[4] Department of Multimedia and Creative Technology, Daffodil International University, Daffodil Smart City, Khagan, Ashulia, Dhaka
关键词
Fault tolerance - Large dataset - Linear regression - Open source software - Predictive analytics;
D O I
10.1155/2023/6048891
中图分类号
学科分类号
摘要
With the advancement of Internet technologies and the rapid increase of World Wide Web applications, there has been tremendous growth in the volume of digital data. This takes the digital world into a new era of big data. Various existing data processing technologies are not consistent and scalable in handling the complexity as well as the large-size datasets. Recently, there are many distributed data processing, and programming models have been proposed and implemented to handle big data applications. The open-source-implemented MapReduce programming model in Apache Hadoop is the foremost model for data exhaustive and also computational-intensive applications due to its inherent characteristics of scalability, fault tolerance, and simplicity. In this research article, a new approach for the prediction of target labels in big data applications is developed using a multiple linear regression algorithm and MapReduce programming model, named as MR-MLR. This approach promises optimum values for MAE, RMSE, and determination coefficient (R2) and thus shows its effectiveness in predictions in big data applications. © 2023 P. Natesan et al.
引用
收藏
相关论文
共 31 条
[1]  
Bishop C.M., Pattern Recognition and Machine Learning, (2007)
[2]  
Bro R., Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis, Chemometrics and Intelligent Laboratory Systems, 46, 2, pp. 133-147, (1999)
[3]  
Nilsson J., De Jong S., Smilde A.K., Multiway calibration in 3D QSAR, Journal of Chemometrics, 11, 6, pp. 511-524, (1997)
[4]  
Draper N., Smith H., Pownell E., Applied Regression Analysis, 706, (1998)
[5]  
Kleinbaum D., Kupper L., Muller K., Applied Regression Analysis and Other Multivariable Methods, (2007)
[6]  
Lu H., Plataniotis K.N., Venetsanopoulos A.N., MPCA: Multilinear principal component analysis of tensor objects, IEEE Transactions on Neural Networks, 19, 1, pp. 18-39, (2008)
[7]  
Shashua A., Levin A., Linear image coding for regression and classification using the tensor-rank principle, Proceedings of the IEEE Comput. Soc.Conf. Comput. Vis. Pattern Recog, 1, pp. 42-49, (2001)
[8]  
Sun J., Tao D., Papadimitriou S., Yu P.S., Faloutsos C., Incremental tensor analysis: Theory and applications, ACM Transactions on Knowledge Discovery from Data, 2, 3, pp. 1-37, (2008)
[9]  
Yang J., Zhang D., Frangi A.F., Yang J.-Y., Two-dimensional Pca: A new approach to appearance-based face representation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1, pp. 131-137, (2004)
[10]  
Ye J., Generalized low rank approximations of matrices, Proceedings of the Twenty-First International Conference on Machine Learning, pp. 887-894, (2004)