Ensemble network traffic classification: Algorithm comparison and novel ensemble scheme proposal

被引:41
作者
Egea Gomez, Santiago [1 ]
Carro Martinez, Belen [1 ]
Sanchez-Esguevillas, Antonio J. [1 ]
Hernandez Callejo, Luis [1 ]
机构
[1] Univ Valladolid, Escuela Tecn Super Ingn Telecomunicac, Campus Miguel Delibes, E-47011 Valladolid, Spain
关键词
MACHINE LEARNING ALGORITHMS; NEURAL-NETWORKS;
D O I
10.1016/j.comnet.2017.07.018
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Network Traffic Classification (NTC) is a key piece for network monitoring, Quality-of-Service management and network security. Machine Learning algorithms have drawn the attention of many researchers during the last few years as a promising solution for network traffic classification. In Machine Learning, ensemble algorithms are classifiers formed by a set of base estimators that cooperate to build more complex models according to given training and classification strategies. Resulting models normally exhibit significant accuracy improvements compared to single estimators, but also extra time cost, which may obstruct the application of these methods to online NTC. This paper studies and compares the performance of seven popular ensemble algorithms based on Decision Trees, focusing on model accuracy, byte accuracy, and latency to determine whether ensemble learning can be properly applied to this modeling task. We show that some of the studied algorithms overcome single Decision Tree in terms of model accuracy and byte accuracy. However, the notable latency increase hinders the application of these methods in real time contexts. Additionally, we introduce a novel ensemble classifier that exploits the imbalanced populations presented in traffic networks datasets to achieve faster classifications. The experimental results show that our scheme retains the accuracy improvements of ensemble methods but with low latency punishment, enhancing the prospect of ensembles methods for online network traffic classification. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:68 / 80
页数:13
相关论文
共 40 条
  • [11] A Survey on Internet Traffic Identification
    Callado, Arthur
    Kamienski, Carlos
    Szabo, Geza
    Gero, Balazs Peter
    Kelner, Judith
    Fernandes, Stenio
    Sadok, Djamel
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2009, 11 (03): : 37 - 52
  • [12] Carela-Espanol Valentin, 2014, Passive and Active Measurement. 15th International Conference, PAM 2014. Proceedings: LNCS 8362, P98, DOI 10.1007/978-3-319-04918-2_10
  • [13] Analysis of the impact of sampling on Net Flow traffic classification
    Carela-Espanol, Valentin
    Barlet-Ros, Pere
    Cabellos-Aparicio, Albert
    Sole-Pareta, Josep
    [J]. COMPUTER NETWORKS, 2011, 55 (05) : 1083 - 1099
  • [14] Carvalho P., 2007, BROADB CONV NETW BCN, P1
  • [15] Casas P., 2011, MINETRAC MINING FLOW
  • [16] Issues and Future Directions in Traffic Classification
    Dainotti, Alberto
    Pescape, Antonio
    Claffy, Kimberly C.
    [J]. IEEE NETWORK, 2012, 26 (01): : 35 - 40
  • [17] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [18] Deri L, 2014, INT WIREL COMMUN, P617, DOI 10.1109/IWCMC.2014.6906427
  • [19] Dietterich T. G., 1995, Journal of Artificial Intelligence Research, V2, P263
  • [20] An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization
    Dietterich, TG
    [J]. MACHINE LEARNING, 2000, 40 (02) : 139 - 157