Unsuccessful Story about Few Shot Malware Family Classification and Siamese Network to the Rescue

被引:24
作者
Bai, Yude [1 ]
Xing, Zhenchang [2 ]
Li, Xiaohong [1 ]
Feng, Zhiyong [3 ]
Ma, Duoyuan [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin Key Lab Adv Networking TANK, Tianjin, Peoples R China
[2] Australian Natl Univ, Data61 CSIRO, Res Sch Comp Sci, Canberra, ACT, Australia
[3] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Software, Tianjin, Peoples R China
来源
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年
基金
美国国家科学基金会;
关键词
Malware family classification; Few shot learning; Siamese network;
D O I
10.1145/3377811.3380354
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To battle the ever-increasing Android malware, malware family classification, which classifies malware with common features into a malware family, has been proposed as an effective malware analysis method. Several machine-learning based approaches have been proposed for the task of malware family classification. Our study shows that malware families suffer from several data imbalance, with many families with only a small number of malware applications (referred to as few shot malware families in this work). Unfortunately, this issue has been overlooked in existing approaches. Although existing approaches achieve high classification performance at the overall level and for large malware families, our experiments show that they suffer from poor performance and generalizability for few shot malware families, and traditionally downsampling method cannot solve the problem. To address the challenge in few shot malware family classification, we propose a novel siamese-network based learning method, which allows us to train an effective MultiLayer Perceptron (MLP) network for embedding malware applications into a real-valued, continuous vector space by contrasting the malware applications from the same or different families. In the embedding space, the performance of malware family classification can be significantly improved for all scales of malware families, especially for few shot malware families, which also leads to the significant performance improvement at the overall level.
引用
收藏
页码:1560 / 1571
页数:12
相关论文
共 59 条
  • [21] High-frequency Keywords to Predict Defects for Android Applications
    Fan, Yaqing
    Cao, Xinya
    Xu, Jing
    Xu, Sihan
    Yang, Hongji
    [J]. 2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2018), VOL 2, 2018, : 442 - 447
  • [22] Apposcopy: Semantics-Based Detection of Android Malware through Static Analysis
    Feng, Yu
    Anand, Saswat
    Dillig, Isil
    Aiken, Alex
    [J]. 22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 576 - 587
  • [23] Fengguo Wei, 2017, Detection of Intrusions and Malware, and Vulnerability Assessment. 14th International Conference, DIMVA 2017. Proceedings: LNCS 10327, P252, DOI 10.1007/978-3-319-60876-1_12
  • [24] A Neural Model for Method Name Generation from Functional Description
    Gao, Sa
    Chen, Chunyang
    Xing, Zhenchang
    Ma, Yukun
    Song, Wen
    Lin, Shang-Wei
    [J]. 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 411 - 421
  • [25] Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware
    Garcia, Joshua
    Hammad, Mahmoud
    Malek, Sam
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2018, 26 (03)
  • [26] Glorot X., 2011, INT C ARTIF INTELLIG, P315
  • [27] Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description
    Han, Zhuobing
    Li, Xiaohong
    Xing, Zhenchang
    Liu, Hongtao
    Feng, Zhiyong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 125 - 136
  • [28] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284
  • [29] A comparison of methods for multiclass support vector machines
    Hsu, CW
    Lin, CJ
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02): : 415 - 425
  • [30] Kim Yoon, 2014, Encyclopediaof Microfluidics and Nanofluidics, P1746, DOI DOI 10.3115/V1/D14-1181