An enhancement for image-based malware classification using machine learning with low dimension normalized input images

被引:14
作者
Son, Tran The [1 ]
Lee, Chando [2 ]
Le -Minh, Hoa [3 ]
Aslam, Nauman [3 ]
Dat, Vuong Cong [1 ]
机构
[1] Vietnam Korea Univ Informat & Commun Technol, Da Nang, Vietnam
[2] Natl IT Promot Agcy NIPA, Seoul, South Korea
[3] Northumbria Univ, Dept Math Phys & Elect Engn, Newcastle Upon Tyne, Northumberland, England
关键词
Image-based Malware Classification; k-NN; SVM; CNN; GIST descriptor; SCENE;
D O I
10.1016/j.jisa.2022.103308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a simple and effective model applied for image-based malware classification using machine learning in which malware images (converted from malware binary files) are directly fed into the classifiers, i.e. k nearest neighbour (k-NN), support vector machine (SVM) and convolution neural networks (CNN). The proposed model does not use the normalized fixed-size square images (e.g. 64 x 64 pixels) or features extracted by image descriptor (e.g. GIST) for training classifiers as existing models do in the literature. Instead, the input images are normalized and horizontally sized down (the width of the image) to a lower dimension of 32 x 64, 16 x 64 or even 8 x 64 than square ones (e.g. 64 x 64 pixels) to reduce the complexity and training time of the model. It is based on the fact that the texture of the malware image is mainly vertically distributed as analysed in this paper. This finding is significant for training those devices which have limited computational resources such as IoT devices. The experiment was conducted on the Malimg, Malheur datasets which contains 9339 (25 malware families) and 3133 variant samples (24 malware families) using k-NN, SVM and CNN classifiers. The achieved results show that it is possible to reduce the dimension of the input images (i.e. 32 x 64, 16 x 64 or even 8 x 64) while still retaining the accuracy of classification as the same as the accuracy obtained by classifier feeding by the fixed-size square image (i.e. 64 x 64 pixels). As a result, training time of the propose model reduces by a half, a quarter, and one-eighth compared to training time taken by the same machine learning-based classifier (i.e. k -NN, SVM and CNN) feeding by fixed-sized square images, i.e. 64 x 64, respectively.
引用
收藏
页数:13
相关论文
共 44 条
[31]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[32]  
Powers D. M. W., 2011, Inter. Journal of Machine Learning Technologies, V2, P37, DOI [DOI 10.48550/ARXIV.2010.16061, 10.48550/arXiv.2010.16061]
[33]   A survey of IoT malware and detection methods based on static features [J].
Quoc-Dung Ngo ;
Huy-Trung Nguyen ;
Van-Hoang Le ;
Doan-Hieu Nguyen .
ICT EXPRESS, 2020, 6 (04) :280-286
[34]   Automatic analysis of malware behavior using machine learning [J].
Rieck, Konrad ;
Trinius, Philipp ;
Willems, Carsten ;
Holz, Thorsten .
JOURNAL OF COMPUTER SECURITY, 2011, 19 (04) :639-668
[35]   A state-of-the-art survey of malware detection approaches using data mining techniques [J].
Souri, Alireza ;
Hosseini, Rahil .
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2018, 8
[36]  
Stamp M., 2018, DATA ANAL INTRO MACH
[37]   Lightweight Classification of IoT Malware Based on Image Recognition [J].
Su, Jiawei ;
Vargas, Danilo Vasconcellos ;
Prasad, Sanjiva ;
Sgandurra, Daniele ;
Feng, Yaokai ;
Sakurai, Kouichi .
2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2018), VOL 2, 2018, :664-669
[38]  
Tareen S. A. K., 2018, 2018 INT C COMPUTING, V2018, P1, DOI [10.1109/ICOMET.2018.8346440, https://doi.org/10.1109/ICOMET.2018.8346440]
[39]   COMPUTER MOVIE SIMULATING URBAN GROWTH IN DETROIT REGION [J].
TOBLER, WR .
ECONOMIC GEOGRAPHY, 1970, 46 (02) :234-240
[40]  
Van den Bossche J, 2019, SCIKIT LEARN