Deep Hashing for Malware Family Classification and New Malware Identification

被引：2

作者：

Zhang, Yunchun ^{[1
]}

Liao, Zikun ^{[1
]}

Zhang, Ning ^{[1
]}

Min, Shaohui ^{[1
]}

Wang, Qi ^{[1
]}

Quek, Tony Q. S. ^{[2
]}

Zhao, Mingxiong ^{[1
]}

机构：

[1] Yunnan Univ, Engn Res Ctr Cyberspace, Natl Pilot Sch Software, Kunming 650500, Peoples R China

[2] Singapore Univ Technol & Design, Informat Syst Technol & Design, Singapore 487372, Singapore

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 16期

基金：

中国国家自然科学基金;

关键词：

Malware; Feature extraction; Image retrieval; Image classification; Artificial neural networks; Internet of Things; Semantics; Deep hashing; deep neural networks (DNNs); image retrieval; malware classification; malware images; SEMANTICS; NETWORK;

D O I：

10.1109/JIOT.2024.3353250

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Although numerous state-of-the-art deep neural networks have recently been proposed for malware classification, effectively detecting malware on a large-scale sample set and identifying zero-day or new malware variants still pose significant challenges. To address this issue, a deep hashing-based malware classification model is designed for malware identification, including two parts: 1) ResNet50-based deep hashing for malware retrieval and 2) voting-based malware classification. Specifically, multiple deep hashing models are developed by extracting the high-layer outputs (feature maps) from the ResNet50 trained with malware gray-scale images in the first part. In this case, to maximize the Hamming distance or dissimilarity among hash values computed with malware samples under different families, a ResNet50-based deep polarized network (RNDPN) is designed to return Top K similar samples. In the second part, we propose a majority-voting and a Hamming-distance-based voting for malware identification according to the retrieved results. The experiment results show that RNDPN outperforms the other six deep hashing models with 97.54% mean average precision (mAP) for malware retrieval when only 40 similar examples are retrieved, where the best results for all deep hashing models are observed with 48-bits hashing code length. Furthermore, the Hamming distance-based voting method implemented with RNDPN demonstrates unparalleled performance in malware classification compared to other models. Notably, it achieves exceptional results in two key aspects: 1) malware classification accuracy with an impressive accuracy rate of 96.5% and 2) the identification of new or zero-day malware with a commendable accuracy of 85.7%.

引用

页码：26837 / 26851

页数：15

共 50 条

[21] MalFSM: Feature Subset Selection Method for Malware Family Classification
Kong, Zixiao
Xue, Jingfeng
Wang, Yong
Zhang, Qian
Han, Weijie
Zhu, Yufen
CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (01) : 26 - 38
[22] Towards an interpretable deep learning model for mobile malware detection and family identification
Iadarola, Giacomo
Martinelli, Fabio
Mercaldo, Francesco
Santone, Antonella
COMPUTERS & SECURITY, 2021, 105
[23] FLSH: A Framework Leveraging Similarity Hashing for Android Malware and Variant Detection
Hadi, Hassan Jalil
Khalid, Alina
Hussain, Faisal Bashir
Ahmad, Naveed
Alshara, Mohammed Ali
IEEE ACCESS, 2025, 13 : 26142 - 26156
[24] Using String Information for Malware Family Identification
Shrestha, Prasha
Maharjan, Suraj
de la Rosa, Gabriela Ramirez
Sprague, Alan
Solorio, Thamar
Warner, Gary
ADVANCES IN ARTIFICIAL INTELLIGENCE (IBERAMIA 2014), 2014, 8864 : 686 - 697
[25] Malware Behavior Image for Malware Variant Identification
Shaid, Syed Zainudeen Mohd
Maarof, Mohd Aizaini
2014 INTERNATIONAL SYMPOSIUM ON BIOMETRICS AND SECURITY TECHNOLOGIES (ISBAST), 2014, : 238 - 243
[26] IoT Malware Classification Based on Lightweight Convolutional Neural Networks
Yuan, Baoguo
Wang, Junfeng
Wu, Peng
Qing, Xianguo
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (05) : 3770 - 3783
[27] CNN-Based Malware Family Classification and Evaluation
Hebish, Mohamed Wael
Awni, Mohamed
2024 14TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, ICEENG 2024, 2024, : 219 - 224
[28] Malware Family Classification using Active Learning by Learning
Chen, Chin-Wei
Su, Ching-Hung
Lee, Kun-Wei
Bair, Ping-Hao
2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 590 - 595
[29] Evaluating Feature Robustness for Windows Malware Family Classification
Duby, Adam
Taylor, Teryl
Bloom, Gedare
Zhuang, Yanyan
2022 31ST INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2022), 2022,
[30] Malware family classification via efficient Huffman features
O'Shaughnessy, Stephen
Breitinger, Frank
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2021, 37

← 1 2 3 4 5 →