Web-S4AE: a semi-supervised stacked sparse autoencoder model for web robot detection

被引:0
作者
Rikhi Ram Jagat
Dilip Singh Sisodia
Pradeep Singh
机构
[1] National Institute of Technology Raipur,Department of Computer Science and Engineering
来源
Neural Computing and Applications | 2023年 / 35卷
关键词
Web robot detection; Semi-supervised learning; Machine learning; Deep learning; Deep feature extraction; Stacked sparse autoencoder;
D O I
暂无
中图分类号
学科分类号
摘要
Web robots are automated computer programs that can be exploited for benign and malicious activities such as website indexing, monitoring, or unauthorized content scraping and scalping. Several methods are available to detect automated web robots through their footprints and behaviors. Although the accuracy and efficiency of existing methods depend highly on the labeled web log data, countless web requests are generated daily with the help of web robots. Exhaustive and accurate manual labeling of reconstructed sessions is time-consuming and challenging. Further, effective detection of web robots is more challenging with unlabeled or partially labeled data. To address the aforementioned issues, we reformulated web robot detection as a semi-supervised learning problem. In this paper, we propose a deep learning-based Semi-Supervised Stacked Sparse AutoEncoder (Web-S4AE) for web robot detection. The proposed model uses content-based features and features extracted from web access log data to effectively classify web robots. The experiments were conducted on publicly available web log data from a library and information portal to assess the performance of Web-S4AE. The Web-S4AE model was trained in two phases. The first phase; comprises training the model with unlabeled data to extract the hidden information, and in the second phase, the model is fine-tuned using labeled data. The results suggest that incorporating more unlabeled data can significantly improve the classifier's performance. The Web-S4AE model’s performance was also compared with other models such as the Decision Tree (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multi-Layer Perceptron (MLP).
引用
收藏
页码:17883 / 17898
页数:15
相关论文
共 77 条
[1]  
Reed S(2014)Training deep neural networks on noisy labels with bootstrapping Sci York 10 10-284
[2]  
Lee H(2015)Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study Knowl Inf Syst 42 245-157
[3]  
Anguelov D(2018)A survey on deep learning for big data Inf Fusion 42 146-2179
[4]  
Triguero I(2019)Stacked sparse autoencoder and history of binary motion image for human activity recognition Multimed Tools Appl 78 2157-20
[5]  
García S(2021)Improved heart disease prediction using particle swarm optimization based stacked sparse autoencoder Electronics 10 2347-208
[6]  
Herrera F(2022)Handling partially labeled network data: a semi-supervised approach using stacked sparse autoencoder Comput Netw 207 108742-245
[7]  
Zhang Q(2020)Traffic classification at the radio spectrum level using deep learning models trained with synthetic data Int J Netw Manag 30 1-41248
[8]  
Yang LT(2019)Unsupervised pre-training of a deep LSTM-based stacked autoencoder for multivariate time series forecasting problems Sci Rep 9 19038-10
[9]  
Chen Z(2010)Why does unsupervised pre-training help deep learning? J Mach Learn Res 9 201-4028
[10]  
Li P(2020)Stacked sparse autoencoder in cavitation noise signal data classification of hydro turbine based on power spectrum J Low Freq Noise Vib Act Control 39 233-708