AMAL: High-fidelity, behavior-based automated malware analysis and classification

被引:144
作者
Mohaisen, Aziz [1 ]
Alrawi, Omar [2 ]
Mohaisen, Manar [3 ]
机构
[1] Verisign Labs, Bozeman, MT USA
[2] Qatar Fdn, QCRI, Doha, Qatar
[3] Korea Tech, Cheonan, South Korea
关键词
Malware; Classification; Automatic analysis; Clustering; Machine learning; Dynamic analysis;
D O I
10.1016/j.cose.2015.04.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces AMAL, an automated and behavior-based malware analysis and labeling system that addresses shortcomings of the existing systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4000 samples) and large-scale datasets (more than 115,000 samples) collected and analyzed by AutoMal over 13 months shows AMAL's effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5% and recall of 99.6% for certain families' classification, and more than 98% of precision and recall for unsupervised clustering. Several benchmarks, cost estimates and measurements highlight the merits of AMAL. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:251 / 266
页数:16
相关论文
共 47 条
[1]  
Anonymized for Review, 2013, TECHNICAL REPORT
[2]  
[Anonymous], 2012, ICML
[3]  
[Anonymous], 2004, Introduction to Machine Learning
[4]  
[Anonymous], 2008, NDSS
[5]  
[Anonymous], 2009, Network Security. and Distributed System
[6]  
[Anonymous], 2008, NDSS
[7]  
[Anonymous], 2011, NDSS
[8]  
[Anonymous], 2009, NDSS
[9]  
Antonakakis M, 2010, USENIX SEC S
[10]  
Antonakakis M., 2011, P 20 USENIX SEC S US, V11, P1