Mecha: A Neural-Symbolic Open-Set Homogeneous Decision Fusion Approach for Zero-Day Malware Similarity Detection

被引:0
|
作者
Molloy, Christopher [1 ]
Banks, Jeremy [1 ]
Ding, Steven H. H. [1 ]
Alaca, Furkan [1 ]
Charland, Philippe [2 ]
Walenstein, Andrew [3 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7L2N8, Canada
[2] Def R&D Canada Valcartier, Mission Crit Cyber Secur Sect, Quebec City, PQ G0A 4Z0, Canada
[3] BlackBerry, Secur Res & Dev, Waterloo, ON N2K 0A7, Canada
关键词
Malware; Training; Codes; Accuracy; Neural networks; Visualization; Threat modeling; Research and development; Reinforcement learning; Optimization; Deep learning; reinforcement learning; cybersecurity; malware analysis; SMALL WORLD;
D O I
10.1109/TSE.2025.3531210
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With increasing numbers of novel malware each year, tools are required for efficient and accurate variant matching under the same family, for the purpose of effective proactive threat detection, retro-hunting, and attack campaign tracking. All of the state-of-the-art Deep Learning (DL) approaches assume that the incoming samples originate from known families and incorrectly identify novel families. Additionally, most of the existing solutions that leverage the Siamese Neural Network architecture either rely on pair-wise comparisons or computationally expensive preprocessing steps that are not scalable to a real-world malware triage volume requirement. We propose a different route, Mecha, a Neural-Symbolic Machine Learning (ML) system for malware variant matching and zero-day family detection. Mecha is comprised of an embedding network trained in two different scenarios for byte string embedding and an open-set approximate nearest neighbour algorithm for variant matching and zero-day detection. Our embedding network uses triplet loss for embedding generation and reinforcement-based Expectation Maximization (EM) learning for full deployment optimization. We conduct multiple in-sample and out-of-sample experiments to demonstrate the model's generalizability toward novel variants and families. We also show that Mecha can detect samples outside the known set of malware samples with an accuracy greater than 0.990.
引用
收藏
页码:621 / 637
页数:17
相关论文
共 7 条
  • [1] Adversarial Variational Modality Reconstruction and Regularization for Zero-Day Malware Variants Similarity Detection
    Molloy, Christopher
    Banks, Jeremy
    Ding, Steven H. H.
    Charland, Philippe
    Walenstein, Andrew
    Li, Litao
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 1131 - 1136
  • [2] Zero-X: A Blockchain-Enabled Open-Set Federated Learning Framework for Zero-Day Attack Detection in IoV
    Korba, Abdelaziz Amara
    Boualouache, Abdelwahab
    Ghamri-Doudane, Yacine
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (09) : 12399 - 12414
  • [3] A Reinforcement Learning-Based Approach for Detection Zero-Day Malware Attacks on IoT System
    Ngo, Quoc-Dung
    Nguyen, Quoc-Huu
    ARTIFICIAL INTELLIGENCE TRENDS IN SYSTEMS, VOL 2, 2022, 502 : 381 - 394
  • [4] Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach
    Kumar, Rajesh
    Subbiah, Geetha
    SENSORS, 2022, 22 (07)
  • [5] Deep Neural Network and Transfer Learning for Accurate Hardware-Based Zero-Day Malware Detection
    He, Zhangying
    Rezaei, Amin
    Homayoun, Houman
    Sayadi, Hossein
    PROCEEDINGS OF THE 32ND GREAT LAKES SYMPOSIUM ON VLSI 2022, GLSVLSI 2022, 2022, : 27 - 32
  • [6] Zero-Day Aware Decision Fusion-Based Model for Crypto-Ransomware Early Detection
    Al-rimy, Bander Ali Saleh
    Maarof, Mohd Aizaini
    Prasetyo, Yuli Adam
    Shaid, Syed Zainudeen Mohd
    Ariffin, Aswami Fadillah Mohd
    INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2018, 10 (06): : 82 - 88
  • [7] Breakthrough to Adaptive and Cost-Aware Hardware-Assisted Zero-Day Malware Detection: A Reinforcement Learning-Based Approach
    He, Zhangying
    Makrani, Hosein Mohammadi
    Rafatirad, Setareh
    Homayoun, Houman
    Sayadi, Hossein
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 231 - 238