A hybrid deep learning-based unsupervised anomaly detection in high dimensional data

被引:0
作者
Muneer A. [1 ,2 ]
Taib S.M. [1 ,2 ]
Fati S.M. [3 ]
Balogun A.O. [1 ]
Aziz I.A. [1 ,2 ]
机构
[1] Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar
[2] Centre for Research in Data Science (CERDAS), Universiti Teknologi PETRONAS, Perak, Seri Iskandar
[3] Information Systems Department, Prince Sultan University, Riyadh
关键词
Anomaly detection; Autoencoder; Deep learning; Hybrid model; Outlier detection; Unsupervised learning;
D O I
10.32604/cmc.2022.020732
中图分类号
学科分类号
摘要
Anomaly detection in high dimensional data is a critical research issue with serious implication in the real-world problems. Many issues in this field still unsolved, so several modern anomaly detection methods struggle to maintain adequate accuracy due to the highly descriptive nature of big data. Such a phenomenon is referred to as the “curse of dimensionality” that affects traditional techniques in terms of both accuracy and performance. Thus, this research proposed a hybrid model based on Deep Autoencoder Neural Network (DANN) with five layers to reduce the difference between the input and output. The proposed model was applied to a real-world gas turbine (GT) dataset that contains 87620 columns and 56 rows. During the experiment, two issues have been investigated and solved to enhance the results. The first is the dataset class imbalance, which solved using SMOTE technique. The second issue is the poor performance, which can be solved using one of the optimization algorithms. Several optimization algorithms have been investigated and tested, including stochastic gradient descent (SGD), RMSprop, Adam and Adamax. However, Adamax optimization algorithm showed the best results when employed to train the DANN model. The experimental results show that our proposed model can detect the anomalies by efficiently reducing the high dimensionality of dataset with accuracy of 99.40%, F1-score of 0.9649, Area Under the Curve (AUC) rate of 0.9649, and a minimal loss function during the hybrid model training. © 2022 Tech Science Press. All rights reserved.
引用
收藏
页码:6073 / 6088
页数:15
相关论文
共 71 条
[1]  
Cappa F., Oriani R., Peruffo E., McCarthy I., Big data for creating and capturing value in the digitalized environment: Unpacking the effects of volume, variety, and veracity on firm performance, Journal of Product Innovation Management, 38, 1, pp. 49-67, (2021)
[2]  
Pigni F., Piccoli G., Watson R., Digital data streams: Creating value from the real-time flow of big data, California Management Review, 58, 3, pp. 5-25, (2016)
[3]  
Sestino A., Prete M. I., Piper L., Guido G., Internet of things and big data as enablers for business digitalization strategies, Technovation, 98, 1, (2020)
[4]  
Gandomi A., Haider M., Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, 35, 2, pp. 137-144, (2015)
[5]  
Thudumu S., Branch P., Jin J., Singh J. J., A comprehensive survey of anomaly detection techniques for high dimensional big data, Journal of Big Data, 7, 1, pp. 1-30, (2020)
[6]  
Lee I., Big data: Dimensions, evolution, impacts, and challenges, Business Horizons, 60, 3, pp. 293-303, (2017)
[7]  
Oussous A., Benjelloun F.-Z., Lahcen A. A., Belfkih S., Big data technologies: A survey, Journal of King Saud University-Computer and Information Sciences, 30, 4, pp. 431-448, (2018)
[8]  
Sadr A. V., Bassett B. A., Kunz M., A flexible framework for anomaly detection via dimensionality reduction, Neural Computing and Applications, 10, 1, pp. 1-11, (2021)
[9]  
Chandola V., Banerjee A., Kumar V., Anomaly detection: A survey, ACM Computing Surveys (CSUR), 41, 3, pp. 1-58, (2009)
[10]  
Patcha A., Park J.-M., An overview of anomaly detection techniques: Existing solutions and latest technological trends, Computer Networks, 51, 12, pp. 3448-3470, (2007)