Fault prediction for distributed computing hadoop clusters using real-Time higher order differential inputs to svm: Zedacross

被引:0
作者
Pinto J. [1 ]
Jain P. [1 ]
Kumar T. [1 ]
机构
[1] Indian Institute of Information Technology, Campus MNIT Jaipur, Kota, 1st Floor Prabha Bhavan, Jaipur
关键词
Fault prediction; Ganglia; Hadoop; Higher order differential; Svm;
D O I
10.1504/IJICS.2020.105155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hadoop distributed computing clusters are used worldwide for high-performance computations. Often various hardware and software faults occur, leading to both data and computation time losses. This paper proposes the usage of a fault prediction software called Zedacross which uses machine learning principles combined with cluster monitoring tools. Firstly, the paper suggests a model that uses the resource usage statistics of a normally functioning Hadoop cluster to create a machine learning model that can then be used to predict and detect faults in real time. Secondly, the paper explains the novel idea of using higher order differentials as inputs to SVM for highly accurate fault predictions. Predictions of system faults by observing system resource usage statistics in real-Time with minimum delay will play a vital role in deciding the need for job rescheduling tasks or even dynamic up-scaling of the cluster. To demonstrate the effectiveness of the design a Java utility was built to perform cluster fault monitoring. The results obtained after running the system on various test cases demonstrate that the proposed method is accurate and effective. © 2020 Inderscience Enterprises Ltd.
引用
收藏
页码:181 / 198
页数:17
相关论文
共 24 条
[1]  
Armbrust M., Fox A., Griffith R., Joseph A.D., Katz R., Konwinski A., Lee G., Et al., A view of cloud computing, Communications of the ACM, 53, 4, pp. 50-58, (2010)
[2]  
Denning D.E., An intrusion-detection model, IEEE Transactions on Software Engineering, 13, 2, pp. 222-232, (1987)
[3]  
Ding Y., Ross A., An ensemble of one-class SVMs for fingerprint spoof detection across different fabrication materials, 2016 IEEE International Workshop on Information Forensics and Security (WIFS, pp. 1-6, (2016)
[4]  
Gabel M., Sato K., Keren D., Matsuoka S., Schuster A., Latent fault detection with unbalanced workloads, EDBT/ICDT Workshops, pp. 118-124, (2015)
[5]  
Haeberlen A., Kuznetsov P., The fault detection problem, International Conference on Principles of Distributed Systems, pp. 99-114, (2009)
[6]  
Haeberlen A., Kouznetsov P., Druschel P., PeerReview: Practical accountability for distributed systems, ACM SIGOPS Operating Systems Review, ACM, 41, 6, pp. 175-188, (2007)
[7]  
Hu P., Dai W., Enhancing fault tolerance based on Hadoop cluster, International Journal of Database Theory and Application, 7, 1, pp. 37-48, (2014)
[8]  
Kadirvel S., Ho J., Fortes J.A.B., Fault management in map-reduce through early detection of anomalous nodes, Proceedings of the 10th International Conference on Autonomic Computing (ICAC, 13, pp. 235-245, (2013)
[9]  
Karun A.K., Chitharanjan K., A review on Hadoop-HDFS infrastructure extensions, 2013 IEEE Conference on Information and Communication Technologies (ICT), (ICT2013, 11-12, pp. 132-137, (2013)
[10]  
Massie M., Chun B., Culler D., The ganglia distributed monitoring system: Design, implementation, and experience, Parallel Computing, 30, 7, pp. 817-840, (2004)