Multiple instance learning for malware classification

被引:46
作者
Stiborek, Jan [1 ,2 ]
Pevny, Tomas [1 ,2 ]
Rehak, Martin [1 ,2 ]
机构
[1] Cisco Syst Inc, 170 West Tasman Dr, San Jose, CA 95134 USA
[2] Czech Tech Univ, FEE, Dept Comp Sci, Karlovo Namesti 13, Prague 12135, Czech Republic
关键词
Malware; Dynamic analysis; Sandboxing; Multiple instance learning; Classification; Random forest; FRAMEWORK; ALGORITHM;
D O I
10.1016/j.eswa.2017.10.036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work addresses classification of unknown binaries executed in sandbox by modeling their interaction with system resources (files, mutexes, registry keys and communication with servers over the network) and error messages provided by the operating system, using vocabulary-based method from the multiple instance learning paradigm. It introduces similarities suitable for individual resource types that combined with an approximative clustering method efficiently group the system resources and define features directly from data. This approach effectively removes randomization often employed by malware authors and projects samples into low-dimensional feature space suitable for common classifiers. An extensive comparison to the state of the art on a large corpus of binaries demonstrates that the proposed solution achieves superior results using only a fraction of training samples. Moreover, it makes use of a source of information different than most of the prior art, which increases the diversity of tools detecting the malware, hence making detection evasion more difficult. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:346 / 357
页数:12
相关论文
共 67 条
[1]   Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification [J].
Ahmadi, Mansour ;
Ulyanov, Dmitry ;
Semenov, Stanislav ;
Trofimov, Mikhail ;
Giacinto, Giorgio .
CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, :183-194
[2]   Multiple instance classification: Review, taxonomy and comparative study [J].
Amores, Jaume .
ARTIFICIAL INTELLIGENCE, 2013, 201 :81-105
[3]   Graph-based malware detection using dynamic analysis [J].
Anderson, Blake ;
Quist, Daniel ;
Neil, Joshua ;
Storlie, Curtis ;
Lane, Terran .
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2011, 7 (04) :247-258
[4]  
Anderson Blake., 2012, P 5 ACM WORKSHOP SEC, P3
[5]  
[Anonymous], 1966, Soviet Physics Doklady
[6]  
[Anonymous], 2006, A comparison of multi-instance learning algorithms
[7]  
AV-Test, 2016, TECHNICAL REPORT
[8]  
Bayer U., 2009, 16 ANN NETW DISTR SY
[9]  
Bishop Christopher M, 2016, Pattern recognition and machine learning
[10]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,