Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation

被引:97
作者
Alhaj, Taqwa Ahmed [1 ]
Siraj, Maheyzah Md [1 ]
Zainal, Anazida [1 ]
Elshoush, Huwaida Tagelsir [2 ]
Elhaj, Fatin [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp, Informat Assurance & Secur Res Grp, Johor Baharu, Johor, Malaysia
[2] Univ Khartoum, Fac Math Sci, Khartoum, Sudan
来源
PLOS ONE | 2016年 / 11卷 / 11期
关键词
D O I
10.1371/journal.pone.0166017
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset.
引用
收藏
页数:18
相关论文
共 25 条
  • [1] Anderson D, 1993, TECHNICAL REPORT
  • [2] [Anonymous], 2001, Proceedings of the 4th International Symposium on Recent Advances in Intrusion Detection, RAID'00, DOI 10.1007/3-540-45474-86
  • [3] Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset
    Bolon-Canedo, V.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 5947 - 5957
  • [4] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
  • [5] Elshoush H.T. Osman., 2012, Proceedings of the World Congress on Engineering, V1, P1
  • [6] Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
  • [7] Filter versus wrapper gene selection approaches in DNA microarray domains
    Inza, I
    Larrañaga, P
    Blanco, R
    Cerrolaza, AJ
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2004, 31 (02) : 91 - 103
  • [8] Lee W, 2001, AI REV, V4, P533
  • [9] An Alert Aggregation Algorithm Based on Iterative Self-Organization
    Man, Dapeng
    Yang, Wu
    Wang, Wei
    Xuan, Shichang
    [J]. 2012 INTERNATIONAL WORKSHOP ON INFORMATION AND ELECTRONICS ENGINEERING, 2012, 29 : 3033 - 3038
  • [10] Mohamed A. B., 2012, Proceedings of the 2012 International Conference on Communication Systems and Network Technologies (CSNT 2012), P720, DOI 10.1109/CSNT.2012.212