Anomaly detection in NetFlow network traffic using supervised machine learning algorithms

被引:20
作者
Fosic, Igor [1 ]
Zagar, Drago [2 ]
Grgic, Kresimir [2 ]
Krizanovic, Visnja [2 ]
机构
[1] HEP Telekomunikacije Doo PS Osijek, M Divalta 199, Osijek 31000, Croatia
[2] Josip Juraj Strossmayer Univ Osijek, Fac Elect Engn Comp Sci & Informat Technol Osijek, Kneza Trpimira 2B, HR-31000 Osijek, Croatia
关键词
Supervised algorithm; Machine learning; Anomaly classification; NetFlow; Imbalanced dataset;
D O I
10.1016/j.jii.2023.100466
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Anomaly detection is an important method for monitoring network traffic where is important to successfully distinguish normal traffic from abnormal traffic. For this purpose, one could use the existing classification al-gorithms as a part of the machine learning (ML) process. In this paper, some of the classification algorithms (Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), K-Nearest Neighbor (K-NN), Gaussian Naive Bayes (GNB), Decision Tree (DT), Random Forest (RF), AdaBoost (AB)) were tested on the public UNSW-NB15 dataset. Different encoding methods and ratios of training and test data resulted in the optimal parameters classifiers. Due to the imbalanced distribution of normal and abnormal network traffic data, both standard performance scores and additional classification performance scores (F2-score, Area Under ROC Curve (AUC)) were used, that better describe the obtained results. The RF Classifier with F2-score = 97.68% and AUC score = 98.47% obtained the best results using a representative subset within the original dataset due to the shorter duration of the computations. Features in the referential dataset were reduced by 82% and selected following the structure of the NetFlow data stream. Concerning similar studies, this paper compares several algorithms for anomaly detection and selects the best one for NetFlow data streams. The F2-score and AUC metric is applied, which achieves very high accuracy compared to classic metrics that do not show realistic accuracy in imbalanced datasets. Less time was spent using Label enoding (LE) with the same accuracy compared to One-hot (OH) encoding used in similar research. The novelty introduced by this paper is in the optimization of the ML process and influence of the ratio of data for learning and testing, different encoding methods of categorical features, and feature reduction on the NetFlow data streams.
引用
收藏
页数:10
相关论文
共 43 条
[1]   Intrusion detection in internet of things using supervised machine learning based on application and transport layer features using UNSW-NB15 data-set [J].
Ahmad, Muhammad ;
Riaz, Qaiser ;
Zeeshan, Muhammad ;
Tahir, Hasan ;
Haider, Syed Ali ;
Khan, Muhammad Safeer .
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2021, 2021 (01)
[2]   Enhancing Machine Learning Prediction in Cybersecurity Using Dynamic Feature Selector [J].
Ahsan, Mostofa ;
Gomes, Rahul ;
Chowdhury, Md. Minhaz ;
Nygard, Kendall E. .
JOURNAL OF CYBERSECURITY AND PRIVACY, 2021, 1 (01) :199-218
[3]   Enhancing the Robustness of Visual Object Tracking via Style Transfer [J].
Amirkhani, Abdollah ;
Barshooi, Amir Hossein ;
Ebrahimi, Amir .
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01) :981-997
[4]   Examining the Suitability of NetFlow Features in Detecting IoT Network Intrusions [J].
Awad, Mohammed ;
Fraihat, Salam ;
Salameh, Khouloud ;
Al Redhaei, Aneesa .
SENSORS, 2022, 22 (16)
[5]   On Internet Traffic Classification: A Two-Phased Machine Learning Approach [J].
Bakhshi, Taimur ;
Ghita, Bogdan .
JOURNAL OF COMPUTER NETWORKS AND COMMUNICATIONS, 2016, 2016
[6]   A novel data augmentation based on Gabor filter and convolutional deep learning for improving the classification of COVID-19 chest X-Ray images [J].
Barshooi, Amir Hossein ;
Amirkhani, Abdollah .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 72
[7]   Performance evaluation of intrusion detection based on machine learning using Apache Spark [J].
Belouch, Mustapha ;
El Hadaj, Salah ;
Idhammad, Mohamed .
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2017), 2018, 127 :1-6
[8]  
Bhattacharyya DK., 2013, Network Anomaly Detection, DOI DOI 10.1201/B15088
[9]  
Brownlee J., 2020, TRAIN TEST SPLIT EVA
[10]  
BROWNLEE J., 2021, Cost-Sensitive Learning for Imbalanced Classification [Online]. Available