Design of an Iterative Method for Malware Detection Using Autoencoders and Hybrid Machine Learning Models

被引:0
作者
Beg, Rijvan [1 ]
Pateriya, R. K. [1 ]
Tomar, Deepak Singh [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Comp Sci & Engn Dept, Bhopal 462003, Madhya Pradesh, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Malware; Feature extraction; Training; Machine learning; Convolutional neural networks; Data models; Robustness; Deep learning; Adaptation models; Accuracy; Autoencoders; gradient boosted decision trees; adversarial training; malware analysis; machine learning techniques; DETECTION SYSTEM;
D O I
10.1109/ACCESS.2024.3491185
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the evolving cyber threat landscape, one of the most visible and pernicious challenges is malware activity detection and analysis. Traditional detection and analysis methods face threats of data high-dimensionality, lack of strength against adversarial attacks, and non-efficient use of unlabeled data samples. In this context, we propose a comprehensive framework that applies machine learning methods to enhance evidence collection and malware activity analysis. The approach of our proposed model innovatively uses several advanced machine learning methods. First, in order to reduce the dimensionality of raw malware activity data by 50%, while at the same timestamp preserving critical information, as evidenced by minimal reconstruction error, we apply an autoencoder-based feature learning technique. This technique assists in the extraction of compact, informative, and feature representations covering both global and local discriminative patterns for accurate malware detection. With the addition of Gradient Boosted Decision Trees (GBDT) to features derived from Convolutional Neural Networks (CNN), we further improve the capability of the model. The hybrid model combines the outlier robustness and heterogeneous data handling capability of GBDTs with the hierarchical feature extraction capability of CNNs, resulting in a significant improvement in performance, with an F1-score of 0.95 on a validation set. In order to defend from evasion attacks, we incorporate adversarial training using Generative Adversarial Networks (GANs). It enables effective counteraction against adversarial strategies, reducing adversarial success rates by 60%. The model is trained using adversarial examples, and its parameters are optimized to minimize classification loss across both the normal and distorted inputs, thereby enhancing robustness. Expanding the applicability of the framework, we use semi-supervised self-training using Variational Autoencoders (VAEs) to use both labeled and unlabeled datasets & samples. This approach not only improves anomaly detection by 30% but also allows the model to learn probabilistic latent representations, thereby revealing underlying data structures. Finally, we address the challenge of temporal malware activity analysis through Long Short-Term Memory (LSTM) networks augmented with an attention mechanism. This configuration allows the model to be able to detect and adapt to evolving attack patterns, thus, by 25%, significantly improving the zero-day attack detection.
引用
收藏
页码:175032 / 175055
页数:24
相关论文
共 50 条
  • [31] A Malware Detection Method Based on Machine Learning and Ensemble of Regression Trees
    Li, Xinghua
    Li, Xiaolong
    Wang, Feng
    Li, Wenna
    Li, Ang
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [32] ShielDroid: A Hybrid Approach Integrating Machine and Deep Learning for Android Malware Detection
    Ahmed, Md Faisal
    Biash, Zarin Tasnim
    Shakil, Abu Raihan
    Ryen, Ahmed Ann Noor
    Hossain, Arman
    Bin Ashraf, Faisal
    Hossain, Muhammad Iqbal
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 911 - 916
  • [33] A Deep Learning Approach to the Malware Classification Problem using Autoencoders
    Pinto, Dhiego Ramos
    Duarte, Julio Cesar
    Sant'Ana, Ricardo
    PROCEEDINGS OF THE XV BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS, SBSI 2019: Complexity on Modern Information Systems, 2019,
  • [34] Malware Detection Method using Tree-based Machine Learning Algorithms
    Okada, Satoshi
    Matsuda, Wataru
    Fujimoto, Mariko
    Mitsunaga, Takuho
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING (ICOCO), 2021, : 103 - 108
  • [35] Robust IoT Malware Detection and Classification Using Opcode Category Features on Machine Learning
    Lee, Hyunjong
    Kim, Sooin
    Baek, Dongheon
    Kim, Donghoon
    Hwang, Doosung
    IEEE ACCESS, 2023, 11 (18855-18867) : 18855 - 18867
  • [36] An Android Behavior-Based Malware Detection Method using Machine Learning
    Chang, Wei-Ling
    Sun, Hung-Min
    Wu, Wei
    2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2016,
  • [37] A Comparison of Machine and Deep Learning Models for Detection and Classification of Android Malware Traffic
    Bovenzi, Giampaolo
    Cerasuolo, Francesco
    Montieri, Antonio
    Nascita, Alfredo
    Persico, Valerio
    Pescape, Antonio
    2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022), 2022,
  • [38] Are Machine Learning Models for Malware Detection Ready for Prime Time?
    Cavallaro L.
    Kinder J.
    Pendlebury F.
    Pierazzi F.
    Massacci F.
    Bodden E.
    Sabetta A.
    IEEE Security and Privacy, 2023, 21 (02) : 53 - 56
  • [39] A Review of Android Malware Detection Approaches Based on Machine Learning
    Liu, Kaijun
    Xu, Shengwei
    Xu, Guoai
    Zhang, Miao
    Sun, Dawei
    Liu, Haifeng
    IEEE ACCESS, 2020, 8 (08): : 124579 - 124607
  • [40] Detection of different windows PE malware using machine learning methods
    Kocak, Aynur
    Sogut, Esra
    Alkan, Mustafa
    Erdem, O. Ayhan
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (03): : 1185 - 1197