CNN-LSTM and transfer learning models for malware classification based on opcodes and API calls

被引：8

作者：

Bensaoud, Ahmed ^{[1
]}

Kalita, Jugal ^{[1
]}

机构：

[1] Univ Colorado, Dept Comp Sci, Colorado Springs, CO 80918 USA

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 290卷

关键词：

Malware classification; Long short-term memory (LSTM); Opcode; Natural language processing (NLP); API calls; Convolutional neural network; FRAMEWORK;

D O I：

10.1016/j.knosys.2024.111543

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10) -gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.

引用

页数：14

共 47 条

[1] A New Malware Classification Framework Based on Deep Learning Algorithms [J].

Aslan, Omer ;

Yilmaz, Abdullah Asim .

IEEE ACCESS, 2021, 9 :87936-87951

[2] Analyzing the performance of long short-term memory architectures for malware detection models [J].

Avci, Cigdem ;

Tekinerdogan, Bedir ;

Catal, Cagatay .

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (06) :1

[3]

Bensaoud A., 2020, Int J Netw Secur, V22, P1022

[4]

Bin Qin, 2020, 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), P162, DOI 10.1109/ICBAIE49996.2020.00041

[5] BRIEF: Binary Robust Independent Elementary Features [J].

Calonder, Michael ;

Lepetit, Vincent ;

Strecha, Christoph ;

Fua, Pascal .

COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :778-792

[6] Deep learning based Sequential model for malware analysis using Windows exe API Calls [J].

Catak, Ferhat Ozgur ;

Yaz, Ahmet Faruk ;

Elezaj, Ogerta ;

Ahmed, Javed .

PEERJ COMPUTER SCIENCE, 2020,

[7] Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification [J].

Chaganti, Rajasekhar ;

Ravi, Vinayakumar ;

Pham, Tuan D. .

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2022, 69

[8] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[9]

Chen M., 2012, arXiv

[10]

Clevert DA, 2016, Arxiv, DOI arXiv:1511.07289

← 1 2 3 4 5 →