CNN-LSTM and transfer learning models for malware classification based on opcodes and API calls

被引:8
作者
Bensaoud, Ahmed [1 ]
Kalita, Jugal [1 ]
机构
[1] Univ Colorado, Dept Comp Sci, Colorado Springs, CO 80918 USA
关键词
Malware classification; Long short-term memory (LSTM); Opcode; Natural language processing (NLP); API calls; Convolutional neural network; FRAMEWORK;
D O I
10.1016/j.knosys.2024.111543
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10) -gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.
引用
收藏
页数:14
相关论文
共 47 条
[1]   A New Malware Classification Framework Based on Deep Learning Algorithms [J].
Aslan, Omer ;
Yilmaz, Abdullah Asim .
IEEE ACCESS, 2021, 9 :87936-87951
[2]   Analyzing the performance of long short-term memory architectures for malware detection models [J].
Avci, Cigdem ;
Tekinerdogan, Bedir ;
Catal, Cagatay .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (06) :1
[3]  
Bensaoud A., 2020, Int J Netw Secur, V22, P1022
[4]  
Bin Qin, 2020, 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), P162, DOI 10.1109/ICBAIE49996.2020.00041
[5]   BRIEF: Binary Robust Independent Elementary Features [J].
Calonder, Michael ;
Lepetit, Vincent ;
Strecha, Christoph ;
Fua, Pascal .
COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :778-792
[6]   Deep learning based Sequential model for malware analysis using Windows exe API Calls [J].
Catak, Ferhat Ozgur ;
Yaz, Ahmet Faruk ;
Elezaj, Ogerta ;
Ahmed, Javed .
PEERJ COMPUTER SCIENCE, 2020,
[7]   Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification [J].
Chaganti, Rajasekhar ;
Ravi, Vinayakumar ;
Pham, Tuan D. .
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2022, 69
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]  
Chen M., 2012, arXiv
[10]  
Clevert DA, 2016, Arxiv, DOI arXiv:1511.07289