A New Feature Selection Method for Internet Traffic Classification Using ML

被引:17
作者
Zhen, Liu [1 ]
Qiong, Liu [2 ]
机构
[1] S China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
[2] S China Univ Technol, Sch software, Guangzhou, Peoples R China
来源
2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012) | 2012年 / 33卷
关键词
machine learning; feature selection; multi-class imbalance; Internet traffic classification;
D O I
10.1016/j.phpro.2012.05.220
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
If 248 statistical features are used to characterize network traffic flows, the computation cost of classifier will be overlarge. The feature selection methods referenced here improve the accuracy of majority classes and meanwhile decrease the accuracy in minority classes as the cost. As a result, it brings about the multi-class imbalance problem. In this paper, main contributions include two aspects below. 1) An evaluation criterion based on information theory was proposed to assess how much do one feature bias towards one class. 2) A new feature selection method named BFS was proposed to reduce features and alleviate multi-class imbalance. BFS was compared with fast correlation-based filter (FCBF) and full feature set using Naive Bayes and ten skewed datasets. The results show that 1) BFS is more advantage to maintain the balance of multi-class classification results than FCBF, such as the reduction of g-mean is just about 8% using BFS, 2) classification accuracy of Naive Bayes using BFS can achieve to 90%. (C) 2012 Published by Elsevier B.V. Selection and/or peer review under responsibility of ICMPBE International Committee.
引用
收藏
页码:1338 / 1345
页数:8
相关论文
共 13 条
  • [1] Alejo R, 2008, LECT NOTES COMPUT SC, V5197, P479, DOI 10.1007/978-3-540-85920-8_59
  • [2] Dai Lei, 2008, 2008 9th International Conference on Web-Age Information Management (WAIM), P520, DOI 10.1109/WAIM.2008.30
  • [3] Research on collaborative negotiation for e-commerce.
    Feng, YQ
    Lei, Y
    Li, Y
    Cao, RZ
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2085 - 2088
  • [4] A simple generalisation of the area under the ROC curve for multiple class classification problems
    Hand, DJ
    Till, RJ
    [J]. MACHINE LEARNING, 2001, 45 (02) : 171 - 186
  • [5] Moore A., 2005, RR0513 U LOND, P1
  • [6] Moore A. W., 2005, Performance Evaluation Review, V33, P50, DOI 10.1145/1071690.1064220
  • [7] A Survey of Techniques for Internet Traffic Classification using Machine Learning
    Nguyen, Thuy T. T.
    Armitage, Grenville
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2008, 10 (04): : 56 - 76
  • [8] Tang L, 2005, FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P781
  • [9] Taoufik En-Najjary, 2010, P 22 INT TEL C ITC 2
  • [10] A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification
    Williams, Nigel
    Zander, Sebastian
    Armitage, Grenville
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2006, 36 (05) : 7 - 15