Design of an Iterative Method for Malware Detection Using Autoencoders and Hybrid Machine Learning Models

被引：0

作者：

Beg, Rijvan ^{[1
]}

Pateriya, R. K. ^{[1
]}

Tomar, Deepak Singh ^{[1
]}

机构：

[1] Maulana Azad Natl Inst Technol, Comp Sci & Engn Dept, Bhopal 462003, Madhya Pradesh, India

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Malware; Feature extraction; Training; Machine learning; Convolutional neural networks; Data models; Robustness; Deep learning; Adaptation models; Accuracy; Autoencoders; gradient boosted decision trees; adversarial training; malware analysis; machine learning techniques; DETECTION SYSTEM;

D O I：

10.1109/ACCESS.2024.3491185

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the evolving cyber threat landscape, one of the most visible and pernicious challenges is malware activity detection and analysis. Traditional detection and analysis methods face threats of data high-dimensionality, lack of strength against adversarial attacks, and non-efficient use of unlabeled data samples. In this context, we propose a comprehensive framework that applies machine learning methods to enhance evidence collection and malware activity analysis. The approach of our proposed model innovatively uses several advanced machine learning methods. First, in order to reduce the dimensionality of raw malware activity data by 50%, while at the same timestamp preserving critical information, as evidenced by minimal reconstruction error, we apply an autoencoder-based feature learning technique. This technique assists in the extraction of compact, informative, and feature representations covering both global and local discriminative patterns for accurate malware detection. With the addition of Gradient Boosted Decision Trees (GBDT) to features derived from Convolutional Neural Networks (CNN), we further improve the capability of the model. The hybrid model combines the outlier robustness and heterogeneous data handling capability of GBDTs with the hierarchical feature extraction capability of CNNs, resulting in a significant improvement in performance, with an F1-score of 0.95 on a validation set. In order to defend from evasion attacks, we incorporate adversarial training using Generative Adversarial Networks (GANs). It enables effective counteraction against adversarial strategies, reducing adversarial success rates by 60%. The model is trained using adversarial examples, and its parameters are optimized to minimize classification loss across both the normal and distorted inputs, thereby enhancing robustness. Expanding the applicability of the framework, we use semi-supervised self-training using Variational Autoencoders (VAEs) to use both labeled and unlabeled datasets & samples. This approach not only improves anomaly detection by 30% but also allows the model to learn probabilistic latent representations, thereby revealing underlying data structures. Finally, we address the challenge of temporal malware activity analysis through Long Short-Term Memory (LSTM) networks augmented with an attention mechanism. This configuration allows the model to be able to detect and adapt to evolving attack patterns, thus, by 25%, significantly improving the zero-day attack detection.

引用

页码：175032 / 175055

页数：24

共 50 条

[41] A Survey on Different Approaches for Malware Detection Using Machine Learning Techniques
Rani, S. Soja
Reeja, S. R.
SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 389 - 398
[42] Automated Microsoft Office Macro Malware Detection Using Machine Learning
Bearden, Ruth
Lo, Dan Chai-Tien
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4448 - 4452
[43] Macro Malware Detection using Machine Learning Techniques A New Approach
De los Santos, Sergio
Torres, Jose
ICISSP: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2017, : 295 - 302
[44] Zero-Day Malware Classification and Detection Using Machine Learning
Kumar J.
Rajendran B.
Sudarsan S.D.
SN Computer Science, 5 (1)
[45] Machine-Learning Classifiers for Malware Detection Using Data Features
Habtor, Saleh Abdulaziz
Dahah, Ahmed Haidarah Hasan
JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2021, 15 (03) : 265 - 290
[46] The Curious Case of Machine Learning in Malware Detection
Saad, Sherif
Briguglio, William
Elmiligi, Haytham
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2019, : 528 - 535
[47] Static Malware Analysis Using Machine and Deep Learning
Singh, Himanshu Kumar
Singh, Jyoti Prakash
Tewari, Anand Shanker
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION NETWORKS (ICCCN 2021), 2022, 394 : 437 - 446
[48] PDF Malware Detection Using Visualization and Machine Learning
Liu, Ching-Yuan
Chiu, Min-Yi
Huang, Qi-Xian
Sun, Hung-Min
DATA AND APPLICATIONS SECURITY AND PRIVACY XXXV, 2021, 12840 : 209 - 220
[49] Adversarial ELF Malware Detection Method Using Model Interpretation
Qiao, Yanchen
Zhang, Weizhe
Tian, Zhicheng
Yang, Laurence T.
Liu, Yang
Alazab, Mamoun
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (01) : 605 - 615
[50] Detection of Encrypted Cryptomining Malware Connections With Machine and Deep Learning
Pastor, Antonio
Mozo, Alberto
Vakaruk, Stanislav
Canavese, Daniele
Lopez, Diego R.
Regano, Leonardo
Gomez-Canaval, Sandra
Lioy, Antonio
IEEE ACCESS, 2020, 8 : 158036 - 158055

← 1 2 3 4 5 →