Casting out demons: Sanitizing training data for anomaly sensors

被引:114
作者
Cretu, Gabriela F. [1 ]
Stavrou, Angelos [2 ]
Locasto, Michael E. [3 ]
Stolfo, Salvatore J. [1 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[3] Dartmouth Coll, Inst Secur Technol Studies, Hanover, NH 03755 USA
来源
PROCEEDINGS OF THE 2008 IEEE SYMPOSIUM ON SECURITY AND PRIVACY | 2008年
基金
美国国家科学基金会;
关键词
D O I
10.1109/SP.2008.11
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micro-models" to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free" and "regular" as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site.
引用
收藏
页码:81 / +
页数:3
相关论文
共 31 条
[11]   Building diverse computer systems [J].
Forrest, S ;
Somayaji, A ;
Ackley, DH .
SIXTH WORKSHOP ON HOT TOPICS IN OPERATING SYSTEMS, PROCEEDINGS, 1997, :67-72
[12]  
Freund Y., 1995, EUR C COMP LEARN THE, P23
[13]  
LIPPMANN R, 2000, P 3 INT WORKSH REC A, P162
[14]  
McHugh J., 2000, ACM Transactions on Information and Systems Security, V3, P262, DOI 10.1145/382912.382923
[15]  
Moore D., SPREAD CODE RED WORM
[16]  
NEWSOME J., 2005, IEEE SECURITY PRIVAC
[17]  
PAREKH JJ, 2007, THESIS COLUMBIA U
[18]  
PAREKH SSJ, 2006, SIGCOMM WORKSH LARG
[19]  
Parmanto B, 1996, ADV NEUR IN, V8, P882
[20]  
PATIL H, 1995, P 2 INT WORKSH AUT A