Analysis of network traffic features for anomaly detection

被引:1
作者
Félix Iglesias
Tanja Zseby
机构
[1] Vienna University of Technology,Institute of Telecommunications
来源
Machine Learning | 2015年 / 101卷
关键词
Feature selection; Anomaly detection; Network security; Data preprocessing; Supervised classification;
D O I
暂无
中图分类号
学科分类号
摘要
Anomaly detection in communication networks provides the basis for the uncovering of novel attacks, misconfigurations and network failures. Resource constraints for data storage, transmission and processing make it beneficial to restrict input data to features that are (a) highly relevant for the detection task and (b) easily derivable from network observations without expensive operations. Removing strong correlated, redundant and irrelevant features also improves the detection quality for many algorithms that are based on learning techniques. In this paper we address the feature selection problem for network traffic based anomaly detection. We propose a multi-stage feature selection method using filters and stepwise regression wrappers. Our analysis is based on 41 widely-adopted traffic features that are presented in several commonly used traffic data sets. With our combined feature selection method we could reduce the original feature vectors from 41 to only 16 features. We tested our results with five fundamentally different classifiers, observing no significant reduction of the detection performance. In order to quantify the practical benefits of our results, we analyzed the costs for generating individual features from standard IP Flow Information Export records, available at many routers. We show that we can eliminate 13 very costly features and thus reducing the computational effort for on-line feature generation from live traffic observations at network nodes.
引用
收藏
页码:59 / 84
页数:25
相关论文
共 44 条
[1]  
Blum AL(1997)Selection of relevant features and examples in machine learning Artificial Intelligence 97 245-271
[2]  
Langley P(2002)Choosing multiple parameters for support vector machines Machine Learning 46 131-159
[3]  
Chapelle O(2005)Feature deduction and ensemble design of intrusion detection systems Computers & Security 24 295-307
[4]  
Vapnik V(2004)Least angle regression Annals of Statistics 32 407-499
[5]  
Bousquet O(2006)An introduction to ROC analysis Pattern Recognition Letters 27 861-874
[6]  
Mukherjee S(2003)An introduction to variable and feature selection Journal of Machine Learning Research 3 1157-1182
[7]  
Chebrolu S(2009)Measuring classifier performance: A coherent alternative to the area under the ROC curve Machine Learning 77 103-123
[8]  
Abraham A(1997)Wrappers for feature subset selection Artificial Intelligence 97 273-324
[9]  
Thomas JP(2000)Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory ACM Transactions on Information and System Security 3 262-294
[10]  
Efron B(2010)Stability selection Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 417-473