Unsuccessful Story about Few Shot Malware Family Classification and Siamese Network to the Rescue

被引:24
作者
Bai, Yude [1 ]
Xing, Zhenchang [2 ]
Li, Xiaohong [1 ]
Feng, Zhiyong [3 ]
Ma, Duoyuan [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin Key Lab Adv Networking TANK, Tianjin, Peoples R China
[2] Australian Natl Univ, Data61 CSIRO, Res Sch Comp Sci, Canberra, ACT, Australia
[3] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Software, Tianjin, Peoples R China
来源
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年
基金
美国国家科学基金会;
关键词
Malware family classification; Few shot learning; Siamese network;
D O I
10.1145/3377811.3380354
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To battle the ever-increasing Android malware, malware family classification, which classifies malware with common features into a malware family, has been proposed as an effective malware analysis method. Several machine-learning based approaches have been proposed for the task of malware family classification. Our study shows that malware families suffer from several data imbalance, with many families with only a small number of malware applications (referred to as few shot malware families in this work). Unfortunately, this issue has been overlooked in existing approaches. Although existing approaches achieve high classification performance at the overall level and for large malware families, our experiments show that they suffer from poor performance and generalizability for few shot malware families, and traditionally downsampling method cannot solve the problem. To address the challenge in few shot malware family classification, we propose a novel siamese-network based learning method, which allows us to train an effective MultiLayer Perceptron (MLP) network for embedding malware applications into a real-valued, continuous vector space by contrasting the malware applications from the same or different families. In the embedding space, the performance of malware family classification can be significantly improved for all scales of malware families, especially for few shot malware families, which also leads to the significant performance improvement at the overall level.
引用
收藏
页码:1560 / 1571
页数:12
相关论文
共 59 条
  • [1] Empirical assessment of machine learning-based malware detectors for Android Measuring the gap between in-the-lab and in-the-wild validation scenarios
    Allix, Kevin
    Bissyande, Tegawende F.
    Jerome, Quentin
    Klein, Jacques
    State, Radu
    Le Traon, Yves
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (01) : 183 - 211
  • [2] [Anonymous], 2018, McAfee labs threats predictions report
  • [3] [Anonymous], 2004, ACM SIGKDD EXPLOR NE, DOI DOI 10.1145/1007730.1007733
  • [4] [Anonymous], 2004, P 21 INT C MACH LEAR, DOI DOI 10.1145/1015330.1015415
  • [5] [Anonymous], 2011, Androguard
  • [6] Drebin: Effective and Explainable Detection of Android Malware in Your Pocket
    Arp, Daniel
    Spreitzenbarth, Michael
    Huebner, Malte
    Gascon, Hugo
    Rieck, Konrad
    [J]. 21ST ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2014), 2014,
  • [7] Arzt S, 2014, ACM SIGPLAN NOTICES, V49, P259, DOI [10.1145/2666356.2594299, 10.1145/2594291.2594299]
  • [8] Au K. W. Y., 2012, P 2012 ACM C COMP CO, DOI [10.1145/2382196.2382222, DOI 10.1145/2382196.2382222]
  • [9] Mining Apps for Abnormal Usage of Sensitive Data
    Avdiienko, Vitalii
    Kuznetsov, Konstantin
    Gorla, Alessandra
    Zeller, Andreas
    Arzt, Steven
    Rasthofer, Siegfried
    Bodden, Eric
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, : 426 - 436
  • [10] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865