Malware Visualization for Fine-Grained Classification

被引:91
作者
Fu, Jianwen [1 ]
Xue, Jingfeng [1 ]
Wang, Yong [1 ]
Liu, Zhenyan [1 ]
Shan, Chun [1 ]
机构
[1] Beijing Inst Technol, Sch Software, Beijing 100081, Peoples R China
关键词
Malware visualization; fine-grained classification; RGB-colored image;
D O I
10.1109/ACCESS.2018.2805301
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the rapid rise of automated tools, the number of malware variants has increased dramatically, which poses a tremendous threat to the security of the Internet. Recently, some methods for quick analysis of malware have been proposed, but these methods usually require a large computational overhead and cannot classify samples accurately for large-scale and complex malware data set. Therefore, in this paper, we propose a new visualization method for characterizing malware globally and locally to achieve fast and effective fine-grained classification. We take a new approach to visualize malware as RGB-colored images and extract global features from the images. Gray-level co-occurrence matrix and color moments are selected to describe the global texture features and color features, respectively, which produces low-dimensional feature data to reduce the complexity of training model. Moreover, a series of special byte sequences are extracted from code sections and data sections of malware and are processed into feature vectors by Simhash as the local features. Finally, we merge the global features and local features to perform malware classification using random forest, K-nearest neighbor, and support vector machine. Experimental results show that our approach obtains the highest accuracy of 97.47% and the highest F-measure of 96.85% of 7087 samples from 15 families. Color features and the local features effectively assist in the classification based on texture features and enhance the F-measure by 3.4% and 1%, respectively. Overall, the combination of global features and local features can realize fine-grained malware classification with low computational cost.
引用
收藏
页码:14510 / 14523
页数:14
相关论文
共 33 条
[1]  
[Anonymous], STOC 2002
[2]  
[Anonymous], 2010, BLACK HAT US
[3]  
[Anonymous], 1995, STORAGE RETRIEVAL IM, DOI DOI 10.1117/12.205308
[4]  
[Anonymous], 2017, INT SEC THREAT REP
[5]  
Ban XF, 2014, CHIN CONT DECIS CONF, P5334, DOI 10.1109/CCDC.2014.6852216
[6]  
Cabau G, 2016, INT SYMP SYMB NUMERI, P315, DOI [10.1109/SYNASC.2016.057, 10.1109/SYNASC.2016.51]
[7]   Control Flow-Based Malware Variant Detection [J].
Cesare, Silvio ;
Xiang, Yang ;
Zhou, Wanlei .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2014, 11 (04) :304-317
[8]  
Cheng JYC, 2013, INT CONF MACH LEARN, P1678, DOI 10.1109/ICMLC.2013.6890868
[9]   Automated mapping of large binary objects using primitive fragment type classification [J].
Conti, Gregory ;
Bratus, Sergey ;
Shubina, Anna ;
Sangster, Benjamin ;
Ragsdale, Roy ;
Supan, Matthew ;
Lichtenberg, Andrew ;
Perez-Alemany, Robert .
DIGITAL INVESTIGATION, 2010, 7 :S3-S12
[10]  
Donahue J, 2013, 2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: BIG DATA, EMERGENT THREATS, AND DECISION-MAKING IN SECURITY INFORMATICS, P289, DOI 10.1109/ISI.2013.6578845