Deep Hashing for Malware Family Classification and New Malware Identification

被引:2
|
作者
Zhang, Yunchun [1 ]
Liao, Zikun [1 ]
Zhang, Ning [1 ]
Min, Shaohui [1 ]
Wang, Qi [1 ]
Quek, Tony Q. S. [2 ]
Zhao, Mingxiong [1 ]
机构
[1] Yunnan Univ, Engn Res Ctr Cyberspace, Natl Pilot Sch Software, Kunming 650500, Peoples R China
[2] Singapore Univ Technol & Design, Informat Syst Technol & Design, Singapore 487372, Singapore
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 16期
基金
中国国家自然科学基金;
关键词
Malware; Feature extraction; Image retrieval; Image classification; Artificial neural networks; Internet of Things; Semantics; Deep hashing; deep neural networks (DNNs); image retrieval; malware classification; malware images; SEMANTICS; NETWORK;
D O I
10.1109/JIOT.2024.3353250
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although numerous state-of-the-art deep neural networks have recently been proposed for malware classification, effectively detecting malware on a large-scale sample set and identifying zero-day or new malware variants still pose significant challenges. To address this issue, a deep hashing-based malware classification model is designed for malware identification, including two parts: 1) ResNet50-based deep hashing for malware retrieval and 2) voting-based malware classification. Specifically, multiple deep hashing models are developed by extracting the high-layer outputs (feature maps) from the ResNet50 trained with malware gray-scale images in the first part. In this case, to maximize the Hamming distance or dissimilarity among hash values computed with malware samples under different families, a ResNet50-based deep polarized network (RNDPN) is designed to return Top K similar samples. In the second part, we propose a majority-voting and a Hamming-distance-based voting for malware identification according to the retrieved results. The experiment results show that RNDPN outperforms the other six deep hashing models with 97.54% mean average precision (mAP) for malware retrieval when only 40 similar examples are retrieved, where the best results for all deep hashing models are observed with 48-bits hashing code length. Furthermore, the Hamming distance-based voting method implemented with RNDPN demonstrates unparalleled performance in malware classification compared to other models. Notably, it achieves exceptional results in two key aspects: 1) malware classification accuracy with an impressive accuracy rate of 96.5% and 2) the identification of new or zero-day malware with a commendable accuracy of 85.7%.
引用
收藏
页码:26837 / 26851
页数:15
相关论文
共 50 条
  • [21] MalFSM: Feature Subset Selection Method for Malware Family Classification
    Kong, Zixiao
    Xue, Jingfeng
    Wang, Yong
    Zhang, Qian
    Han, Weijie
    Zhu, Yufen
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (01) : 26 - 38
  • [22] Towards an interpretable deep learning model for mobile malware detection and family identification
    Iadarola, Giacomo
    Martinelli, Fabio
    Mercaldo, Francesco
    Santone, Antonella
    COMPUTERS & SECURITY, 2021, 105
  • [23] FLSH: A Framework Leveraging Similarity Hashing for Android Malware and Variant Detection
    Hadi, Hassan Jalil
    Khalid, Alina
    Hussain, Faisal Bashir
    Ahmad, Naveed
    Alshara, Mohammed Ali
    IEEE ACCESS, 2025, 13 : 26142 - 26156
  • [24] Using String Information for Malware Family Identification
    Shrestha, Prasha
    Maharjan, Suraj
    de la Rosa, Gabriela Ramirez
    Sprague, Alan
    Solorio, Thamar
    Warner, Gary
    ADVANCES IN ARTIFICIAL INTELLIGENCE (IBERAMIA 2014), 2014, 8864 : 686 - 697
  • [25] Malware Behavior Image for Malware Variant Identification
    Shaid, Syed Zainudeen Mohd
    Maarof, Mohd Aizaini
    2014 INTERNATIONAL SYMPOSIUM ON BIOMETRICS AND SECURITY TECHNOLOGIES (ISBAST), 2014, : 238 - 243
  • [26] IoT Malware Classification Based on Lightweight Convolutional Neural Networks
    Yuan, Baoguo
    Wang, Junfeng
    Wu, Peng
    Qing, Xianguo
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (05) : 3770 - 3783
  • [27] CNN-Based Malware Family Classification and Evaluation
    Hebish, Mohamed Wael
    Awni, Mohamed
    2024 14TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, ICEENG 2024, 2024, : 219 - 224
  • [28] Malware Family Classification using Active Learning by Learning
    Chen, Chin-Wei
    Su, Ching-Hung
    Lee, Kun-Wei
    Bair, Ping-Hao
    2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 590 - 595
  • [29] Evaluating Feature Robustness for Windows Malware Family Classification
    Duby, Adam
    Taylor, Teryl
    Bloom, Gedare
    Zhuang, Yanyan
    2022 31ST INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2022), 2022,
  • [30] Malware family classification via efficient Huffman features
    O'Shaughnessy, Stephen
    Breitinger, Frank
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2021, 37