Unsuccessful Story about Few Shot Malware Family Classification and Siamese Network to the Rescue

被引:30
作者
Bai, Yude [1 ]
Xing, Zhenchang [2 ]
Li, Xiaohong [1 ]
Feng, Zhiyong [3 ]
Ma, Duoyuan [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin Key Lab Adv Networking TANK, Tianjin, Peoples R China
[2] Australian Natl Univ, Data61 CSIRO, Res Sch Comp Sci, Canberra, ACT, Australia
[3] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Software, Tianjin, Peoples R China
来源
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年
基金
美国国家科学基金会;
关键词
Malware family classification; Few shot learning; Siamese network;
D O I
10.1145/3377811.3380354
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To battle the ever-increasing Android malware, malware family classification, which classifies malware with common features into a malware family, has been proposed as an effective malware analysis method. Several machine-learning based approaches have been proposed for the task of malware family classification. Our study shows that malware families suffer from several data imbalance, with many families with only a small number of malware applications (referred to as few shot malware families in this work). Unfortunately, this issue has been overlooked in existing approaches. Although existing approaches achieve high classification performance at the overall level and for large malware families, our experiments show that they suffer from poor performance and generalizability for few shot malware families, and traditionally downsampling method cannot solve the problem. To address the challenge in few shot malware family classification, we propose a novel siamese-network based learning method, which allows us to train an effective MultiLayer Perceptron (MLP) network for embedding malware applications into a real-valued, continuous vector space by contrasting the malware applications from the same or different families. In the embedding space, the performance of malware family classification can be significantly improved for all scales of malware families, especially for few shot malware families, which also leads to the significant performance improvement at the overall level.
引用
收藏
页码:1560 / 1571
页数:12
相关论文
共 59 条
[21]   Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis [J].
Fan, Ming ;
Liu, Jun ;
Luo, Xiapu ;
Chen, Kai ;
Tian, Zhenzhou ;
Zheng, Qinghua ;
Liu, Ting .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (08) :1890-1905
[22]   High-frequency Keywords to Predict Defects for Android Applications [J].
Fan, Yaqing ;
Cao, Xinya ;
Xu, Jing ;
Xu, Sihan ;
Yang, Hongji .
2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2018), VOL 2, 2018, :442-447
[23]   Apposcopy: Semantics-Based Detection of Android Malware through Static Analysis [J].
Feng, Yu ;
Anand, Saswat ;
Dillig, Isil ;
Aiken, Alex .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :576-587
[24]   A Neural Model for Method Name Generation from Functional Description [J].
Gao, Sa ;
Chen, Chunyang ;
Xing, Zhenchang ;
Ma, Yukun ;
Song, Wen ;
Lin, Shang-Wei .
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, :411-421
[25]   Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware [J].
Garcia, Joshua ;
Hammad, Mahmoud ;
Malek, Sam .
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2018, 26 (03)
[26]  
Glorot X., 2011, P 14 INT C ART INT S, P315, DOI DOI 10.1002/ECS2.1832
[27]   Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description [J].
Han, Zhuobing ;
Li, Xiaohong ;
Xing, Zhenchang ;
Liu, Hongtao ;
Feng, Zhiyong .
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, :125-136
[28]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[29]   A comparison of methods for multiclass support vector machines [J].
Hsu, CW ;
Lin, CJ .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02) :415-425
[30]  
Kim Y., 2014, P EMNLP 19, DOI DOI 10.3115/V1/D14-1181