HYDRA: A multimodal deep learning framework for malware classification

被引:88
作者
Gibert, Daniel [1 ]
Mateu, Carles [1 ]
Planes, Jordi [1 ]
机构
[1] Univ Lleida, Jaume II 69, Lleida, Spain
关键词
Malware classification; Machine learning; Deep learning; Feature fusion; Multimodal learning; ENTROPY;
D O I
10.1016/j.cose.2020.101873
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While traditional machine learning methods for malware detection largely depend on hand-designed features, which are based on experts' knowledge of the domain, end-to-end learning approaches take the raw executable as input, and try to learn a set of descriptive features from it. Although the latter might behave badly in problems where there are not many data available or where the dataset is imbalanced. In this paper we present HYDRA, a novel framework to address the task of malware detection and classification by combining various types of features to discover the relationships between distinct modalities. Our approach learns from various sources to maximize the benefits of multiple feature types to reflect the characteristics of malware executables. We propose a baseline system that consists of both hand-engineered and end-to-end components to combine the benefits of feature engineering and deep learning so that malware characteristics are effectively represented. An extensive analysis of state-of-the-art methods on the Microsoft Malware Classification Challenge benchmark shows that the proposed solution achieves comparable results to gradient boosting methods in the literature and higher yield in comparison with deep learning approaches. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
[1]  
Aafer Y, 2013, L N INST COMP SCI SO, V127, P86
[2]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[3]   Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification [J].
Ahmadi, Mansour ;
Ulyanov, Dmitry ;
Semenov, Stanislav ;
Trofimov, Mikhail ;
Giacinto, Giorgio .
CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, :183-194
[4]  
[Anonymous], TECHNICAL REPORT
[5]  
[Anonymous], 2015, MACH LEARN ICML
[6]  
[Anonymous], 2017, TECHNICAL REPORT
[7]  
[Anonymous], ARXIV180210135 CORR
[8]  
[Anonymous], 2011, INT S VISUALIZATION, DOI 10.1145/2016904.2016908
[9]  
[Anonymous], 2013, NIPS
[10]  
[Anonymous], ARXIV170602515 CORR