Machine-Learning-Based Feature Selection Techniques for Large-Scale Network Intrusion Detection

被引:47
作者
Al-Jarrah, O. Y. [1 ]
Siddiqui, A. [1 ]
Elsalamouny, M. [1 ]
Yoo, P. D. [1 ,2 ]
Muhaidat, S. [1 ,3 ]
Kim, K. [2 ]
机构
[1] Khalifa Univ, Dept Elect & Comp Engn, Abu Dhabi, U Arab Emirates
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Daejeon, South Korea
[3] Univ Surrey, Dept Elect Engn, Guildford, Surrey, England
来源
2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW) | 2014年
关键词
intrusion detection system; feature selection; machine learning; random forest;
D O I
10.1109/ICDCSW.2014.14
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, we see more and more cyber-attacks on major Internet sites and enterprise networks. Intrusion Detection System (IDS) is a critical component of such infrastructure defense mechanism. IDS monitors and analyzes networks' activities for potential intrusions and security attacks. Machine-learning (ML) models have been well accepted for signature-based IDSs due to their learnability and flexibility. However, the performance of existing IDSs does not seem to be satisfactory due to the rapid evolution of sophisticated cyber threats in recent decades. Moreover, the volumes of data to be analyzed are beyond the ability of commonly used computer software and hardware tools. They are not only large in scale but fast in/out in terms of velocity. In big data IDS, the one must find an efficient way to reduce the size of data dimensions and volumes. In this paper, we propose novel feature selection methods, namely, RF-FSR (RandomForest-Forward Selection Ranking) and RF-BER (RandomForest-Backward Elimination Ranking). The features selected by the proposed methods were tested and compared with three of the most well-known feature sets in the IDS literature. The experimental results showed that the selected features by the proposed methods effectively improved their detection rate and false-positive rate, achieving 99.8% and 0.001% on well-known KDD-99 dataset, respectively.
引用
收藏
页码:177 / 181
页数:5
相关论文
共 23 条
[1]  
[Anonymous], IEEE T COMP IN PRESS
[2]  
[Anonymous], 2013, COMM MARK REP 2013
[3]  
[Anonymous], THESIS
[4]  
[Anonymous], INTELLIGENCE SECURIT
[5]  
[Anonymous], BIG DATA
[6]  
[Anonymous], 2012, NEW YORK TIMES
[7]  
[Anonymous], COMMUNICATIONS SURVE
[8]  
[Anonymous], 2009, S COMP INT SEC DEF A
[9]  
[Anonymous], 2011, 2011 INT C PROC AUT
[10]  
Araujo Nelcileno, 2010, 2010 17th International Conference on Telecommunications (ICT 2010), P552, DOI 10.1109/ICTEL.2010.5478852