SDIF-CNN: Stacking deep image features using fine-tuned convolution neural network models for real-world malware detection and classification

被引:9
作者
Kumar, Sanjeev [1 ]
Panda, Kajal [1 ]
机构
[1] Ctr Dev Adv Comp C DAC, Cyber Secur Technol Div CSTD, Mohali, India
关键词
Malware detection; Machine learning; Convolutional neural networks; Deep learning; Cybersecurity; VISUALIZATION;
D O I
10.1016/j.asoc.2023.110676
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of malware is a complex problem in the area of Internet security. Developing a malware defense system that is less costly to detect large-scale malware is needed. This paper proposes a novel malware detection and classification architecture based on image visualization as SDIF-CNN: Stacking deep image features using fine-tuned convolution neural networks. The hybrid methodology of transfer learning as fine-tuning and feature extractor of deep convolution neural network models is designed. At first, the pre-trained VGG16 CNN model is deeply fine-tuned with different hyperparameters, including the number of layers, learning rate, momentum, etc. The transfer learning-based fine-tuned VGG16 model is used as a feature extractor along with the three similar pre-trained CNN models, VGG19, ResNet50, and InceptionV3, to obtain the diverse feature map. The extracted features are horizontally concatenated to construct a single feature map. The different feature selection methodologies, including filter-based methods and embedded methods, such as linear regression and random forest, are designed to discard the irrelevant features from a stacked feature map. After that, this study uses six machine learning and deep learning classifiers- K-Nearest Neighbor (K-NN), Support Vector Machine (SVM), Random Forest (RF), Multi-Layer Perceptron (MLP), Extra Tree (ET), and Gaussian Naive Bayes (GNB) by using the stacked feature map as a training feature vector. The hyperparameter optimization of the MLP model as the best classifier is performed using a randomized search algorithm to devise an optimal classifier. The experiments are performed using a publicly benchmarked MalImg dataset of 9339 images from 25 families. The model is also validated on real-world and packed malicious programs to prove the generalization of the proposed methodology in detecting real-world malware. In the proposed system, the MLP model obtained the best performance results as 98.55% accuracy, 99% precision, 99% recall, and 99% F1-score for MalImg datasets, and accuracy of 94.78% for real-world malware datasets. The proposed methodology is resilient to commonly used obfuscation techniques and does not depend upon code disassembly, reverse engineering analysis, and highly resource-intensive dynamic analysis. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:19
相关论文
共 57 条
  • [1] Behavior-based ransomware classification: A particle swarm optimization wrapper-based approach for feature selection
    Abbasi, Muhammad Shabbir
    Al-Sahaf, Harith
    Mansoori, Masood
    Welch, Ian
    [J]. APPLIED SOFT COMPUTING, 2022, 121
  • [2] Ataraj L, 2011, P 8 INT S VIS CYB SE, P1, DOI DOI 10.1145/2016904.2016908
  • [3] Mining Apps for Abnormal Usage of Sensitive Data
    Avdiienko, Vitalii
    Kuznetsov, Konstantin
    Gorla, Alessandra
    Zeller, Andreas
    Arzt, Steven
    Rasthofer, Siegfried
    Bodden, Eric
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, : 426 - 436
  • [4] Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention
    Awan, Mazhar Javed
    Masood, Osama Ahmed
    Mohammed, Mazin Abed
    Yasin, Awais
    Zain, Azlan Mohd
    Damasevicius, Robertas
    Abdulkareem, Karrar Hameed
    [J]. ELECTRONICS, 2021, 10 (19)
  • [5] Bhodia N., 2019, P 5 INT C INF SYST
  • [6] Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification
    Chaganti, Rajasekhar
    Ravi, Vinayakumar
    Pham, Tuan D.
    [J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2022, 69
  • [7] cisco, 2020, Cisco Annual Internet Report
  • [8] Malicious code detection based on CNNs and multi-objective algorithm
    Cui, Zhihua
    Du, Lei
    Wang, Penghong
    Cai, Xingjuan
    Zhang, Wensheng
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 129 : 50 - 58
  • [9] Detection of Malicious Code Variants Based on Deep Learning
    Cui, Zhihua
    Xue, Fei
    Cai, Xingjuan
    Cao, Yang
    Wang, Gai-ge
    Chen, Jinjun
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (07) : 3187 - 3196
  • [10] DroidScribe: Classifying Android Malware Based on Runtime Behavior
    Dash, Santanu Kumar
    Suarez-Tangil, Guillermo
    Khan, Salahuddin
    Tam, Kimberly
    Ahmadi, Mansour
    Kinder, Johannes
    Cavallaro, Lorenzo
    [J]. 2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2016), 2016, : 252 - 261