Multi class SVM algorithm with active learning for network traffic classification

被引:95
作者
Dong, Shi [1 ,2 ]
机构
[1] Zhoukou Normal Univ, Sch Comp Sci & Technol, Zhoukou 466001, Peoples R China
[2] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
关键词
Traffic classification; NETFLOW flow; Imbalance problem; Machine learning; SVM; CMSVM; IMBALANCED DATA; BOOSTED SVM;
D O I
10.1016/j.eswa.2021.114885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the current massive amount of traffic that is going through the internet, internet service providers (ISPs) and networking service providers (NSPs) are looking for various ways to accurately predict the application type of flow that is going through the internet. Such prediction is critical for security and network monitoring applications as they require application type to be known in prior. Traditional ways using port-based or payloadbased analysis are not sufficient anymore as many applications start using dynamic unknown port numbers, masquerading, and encryption techniques to avoid being detected. Recently, machine learning has gained significant attention in many prediction applications including traffic classification from flow features or characteristics. However, such algorithms suffer from an imbalanced data problem where some applications have fewer flow data and hence difficult to predict. In this paper, we employ network flow-level characteristics to identify the application type of traffic. Furthermore, we propose the use of an improved support vector machine (SVM) algorithm, named cost-sensitive SVM (CMSVM), to solve the imbalance problem in network traffic identification. CMSVM adopts a multi-class SVM algorithm with active learning which dynamically assigns a weight for applications. We examine the classification accuracy and performance of the CMSVM algorithm using two different datasets, namely MOORE_SET and NOC_SET datasets. Our results show that the CMSVM algorithm can reduce computation cost, improve classification accuracy and solve the imbalance problem when compared to other machine learning techniques.
引用
收藏
页数:11
相关论文
共 28 条
  • [1] MIMETIC: Mobile encrypted traffic classification using multimodal deep learning
    Aceto, Giuseppe
    Ciuonzo, Domenico
    Montieri, Antonio
    Pescape, Antonio
    [J]. COMPUTER NETWORKS, 2019, 165
  • [2] Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges
    Aceto, Giuseppe
    Ciuonzo, Domenico
    Montieri, Antonio
    Pescape, Antonio
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2019, 16 (02): : 445 - 458
  • [3] Multi-classification approaches for classifying mobile app traffic
    Aceto, Giuseppe
    Ciuonzo, Domenico
    Montieri, Antonio
    Pescape, Antonio
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2018, 103 : 131 - 145
  • [4] [Anonymous], 2011, IPV6 FLOW LABEL SPEC
  • [5] [Anonymous], 2005, ICML WORKSH ROC AN M
  • [6] A systematic study of the class imbalance problem in convolutional neural networks
    Buda, Mateusz
    Maki, Atsuto
    Mazurowski, Maciej A.
    [J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
  • [7] Addressing imbalance in multilabel classification: Measures and random resampling algorithms
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    [J]. NEUROCOMPUTING, 2015, 163 : 3 - 16
  • [8] Hellinger distance decision trees are robust and skew-insensitive
    Cieslak, David A.
    Hoens, T. Ryan
    Chawla, Nitesh V.
    Kegelmeyer, W. Philip
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 24 (01) : 136 - 158
  • [9] Issues and Future Directions in Traffic Classification
    Dainotti, Alberto
    Pescape, Antonio
    Claffy, Kimberly C.
    [J]. IEEE NETWORK, 2012, 26 (01): : 35 - 40
  • [10] Flow cluster algorithm based on improved K-means method
    Dong, Shi
    Zhou, Dingding
    Ding, Wei
    Gong, Jian
    [J]. IETE JOURNAL OF RESEARCH, 2013, 59 (04) : 326 - 333