A K-Means clustering and SVM based hybrid concept drift detection technique for network anomaly detection

被引:84
作者
Jain, Meenal [1 ]
Kaur, Gagandeep [1 ]
Saxena, Vikas [1 ]
机构
[1] JIIT, Dept CSE & IT, Noida Sect 62, Noida 201309, India
关键词
Anomaly detection; SVM; K-Means; Clustering; Concept Drift; INTRUSION; CLASSIFICATION;
D O I
10.1016/j.eswa.2022.116510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today's internet data primarily consists of streamed data from various applications like sensor networks, banking data and telecommunication data networks. A new field of study, data stream mining has been gaining popularity to study streamed data behavior. Detection of anomalies in the network traffic also finds its applicability in this context. However traditional machine learning algorithms suffer in providing consistent high accuracy values and give high false alarms. This is due to the presence of concept drift in the captured data streams. Concept drift describes unknown changes in the characteristics of network data over time. Therefore, to handle presence concept drift new methodologies and techniques for drift detection, understanding and adaptation are required. In this paper, we have proposed two techniques, an Error Rate Based Concept Drift Detection and Data Distribution Based Concept Drift Detection and studied their impact. Furthermore, sliding window based data capturing and drift analyzing combined with K-Means Clustering has been used for reducing data size and upgrading training dataset. We have used the Support Vector Machine (SVM) classifier for anomaly detection and retraining of the model has been initiated based on statistical tests. The experiments have been performed on three datasets, namely, generated Testbed Dataset, NSL-KDD and CIDDS-2017. Detection accuracy, KL-Divergence and Kappa Statistics have been used to study the severity of the concept drift in the datasets. After applying the proposed approach, the SVM has been shown to have a better classification accuracy of 93.52%, 99.80% and 91.33% respectively. We achieved a precision rate of 91.84%, 99.1% and 88.3%, a recall rate of 94.30%, 99.2% and 91.7% with an F1 score of 92.9%, 99.15% and 89.6% respectively.
引用
收藏
页数:17
相关论文
共 60 条
[1]  
Alaei P, 2017, 2017 3RD INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), P178, DOI 10.1109/ICWR.2017.7959324
[2]  
[Anonymous], 2000, P 17 INT C MACH LEAR
[3]  
[Anonymous], 2001, THESIS U MADGDEBURG
[4]  
[Anonymous], 2017, CIDDS 2017
[5]  
├a┬uztuna D., 2006, TURK J MED SCI, V36, P171
[6]   An Optimization Model for Clustering Categorical Data Streams with Drifting Concepts [J].
Bai, Liang ;
Cheng, Xueqi ;
Liang, Jiye ;
Shen, Huawei .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) :2871-2883
[7]   An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization [J].
Bamakan, Seyed Mojtaba Hosseini ;
Wang, Huadong ;
Tian Yingjie ;
Shi, Yong .
NEUROCOMPUTING, 2016, 199 :90-102
[8]   Neighbourhood sampling in bagging for imbalanced data [J].
Blaszczynski, Jerzy ;
Stefanowski, Jerzy .
NEUROCOMPUTING, 2015, 150 :529-542
[9]   A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection [J].
Buczak, Anna L. ;
Guven, Erhan .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (02) :1153-1176
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)