Exploratory study on Class Imbalance and solutions for Network Traffic Classification

被引:39
作者
Egea Gomez, Santiago [1 ]
Hernandez-Callejo, Luis [1 ]
Carro Martinez, Belen [1 ]
Sanchez-Esguevillas, Antonio J. [1 ]
机构
[1] Univ Valladolid, Escuela Tecn Super Ingenieros Telecomunicac, E-47011 Valladolid, Spain
关键词
Machine Learning; Network management; Class Imbalance; Network Traffic Classification; MULTICLASS IMBALANCE; NEURAL-NETWORKS; DATA-SETS; IDENTIFICATION; CLASSIFIERS; IMPACT; SMOTE;
D O I
10.1016/j.neucom.2018.07.091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Network Traffic Classification is a fundamental component in network management, and the fast-paced advances in Machine Learning have motivated the application of learning techniques to identify network traffic. The intrinsic features of Internet networks lead to imbalanced class distributions when datasets are conformed, phenomena called Class Imbalance and that is attaching an increasing attention in many research fields. In spite of performance losses due to Class Imbalance, this issue has not been thoroughly studied in Network Traffic Classification and some previous works are limited to few solutions and/or assumed misleading methodological approaches. In this article, we deal with Class Imbalance in Network Traffic Classification, studying the presence of this phenomenon and analyzing a wide number of solutions in two different Internet environments: a lab network and a high-speed backbone. Namely, we experimented with 21 data-level algorithms, six ensemble methods and one cost-level approach. Throughout the experiments performed, we have applied the most recent methodological aspects for imbalanced problems, such as: DOB-SCV validation approach or the performance metrics assumed. And last but not least, the strategies to tune parameters and our algorithm implementations to adapt binary methods to multiclass problems are presented and shared with the research community, including two ensemble techniques used for the first time in Machine Learning to the best of our knowledge. Our experimental results reveal that some techniques mitigated Class Imbalance with interesting benefit for traffic classification models. More specifically, some algorithms reached increases greater than 8% in overall accuracy and greater than 4% in AUC-ROC for the most challenging network scenario. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:100 / 119
页数:20
相关论文
共 70 条
[1]  
[Anonymous], IMBALANCED ALGORITHM
[2]  
[Anonymous], CORALREEF SOFTW SUIT
[3]  
[Anonymous], GITHUB SANTIAGOEG IM
[4]  
[Anonymous], FCBF MODULE
[5]  
[Anonymous], IEEE INTERNET THINGS
[6]  
[Anonymous], LIST ASS PORT NUMB
[7]   Bayesian neural networks for Internet traffic classification [J].
Auld, Tom ;
Moore, Andrew W. ;
Gull, Stephen F. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (01) :223-239
[8]  
Batista G. E., 2003, World of Books, V3, P10
[9]  
Bernaille L., 2006, C FUTURE NETWORKING, P6
[10]   Traffic classification on the fly [J].
Bernaille, Laurent ;
Teixeira, Renata ;
Akodkenou, Ismael ;
Soule, Augustin ;
Salamatian, Kave .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2006, 36 (02) :23-26