Unsuccessful Story about Few Shot Malware Family Classification and Siamese Network to the Rescue

被引：30

作者：

Bai, Yude ^{[1
]}

Xing, Zhenchang ^{[2
]}

Li, Xiaohong ^{[1
]}

Feng, Zhiyong ^{[3
]}

Ma, Duoyuan ^{[1
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin Key Lab Adv Networking TANK, Tianjin, Peoples R China

[2] Australian Natl Univ, Data61 CSIRO, Res Sch Comp Sci, Canberra, ACT, Australia

[3] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Software, Tianjin, Peoples R China

来源：

2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年

基金：

美国国家科学基金会;

关键词：

Malware family classification; Few shot learning; Siamese network;

D O I：

10.1145/3377811.3380354

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

To battle the ever-increasing Android malware, malware family classification, which classifies malware with common features into a malware family, has been proposed as an effective malware analysis method. Several machine-learning based approaches have been proposed for the task of malware family classification. Our study shows that malware families suffer from several data imbalance, with many families with only a small number of malware applications (referred to as few shot malware families in this work). Unfortunately, this issue has been overlooked in existing approaches. Although existing approaches achieve high classification performance at the overall level and for large malware families, our experiments show that they suffer from poor performance and generalizability for few shot malware families, and traditionally downsampling method cannot solve the problem. To address the challenge in few shot malware family classification, we propose a novel siamese-network based learning method, which allows us to train an effective MultiLayer Perceptron (MLP) network for embedding malware applications into a real-valued, continuous vector space by contrasting the malware applications from the same or different families. In the embedding space, the performance of malware family classification can be significantly improved for all scales of malware families, especially for few shot malware families, which also leads to the significant performance improvement at the overall level.

引用

页码：1560 / 1571

页数：12

共 59 条

[1] Empirical assessment of machine learning-based malware detectors for Android Measuring the gap between in-the-lab and in-the-wild validation scenarios [J].

Allix, Kevin ;

Bissyande, Tegawende F. ;

Jerome, Quentin ;

Klein, Jacques ;

State, Radu ;

Le Traon, Yves .

EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (01) :183-211

[2]

[Anonymous], 2018, McAfee labs threats predictions report

[3]

[Anonymous], 2004, ACM SIGKDD Explorations Newsletter, DOI DOI 10.1145/1007730.1007733

[4]

[Anonymous], 2006 IEEE COMP VIS P, DOI 10.1109/CVPR.2006.100

[5] Drebin: Effective and Explainable Detection of Android Malware in Your Pocket [J].

Arp, Daniel ;

Spreitzenbarth, Michael ;

Huebner, Malte ;

Gascon, Hugo ;

Rieck, Konrad .

21ST ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2014), 2014,

[6]

Arzt S, 2014, ACM SIGPLAN NOTICES, V49, P259, DOI [10.1145/2594291.2594299, 10.1145/2666356.2594299]

[7]

Au K.W.Y., 2012, ACM C COMPUTER COMMU, P217

[8] Mining Apps for Abnormal Usage of Sensitive Data [J].

Avdiienko, Vitalii ;

Kuznetsov, Konstantin ;

Gorla, Alessandra ;

Zeller, Andreas ;

Arzt, Steven ;

Rasthofer, Siegfried ;

Bodden, Eric .

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :426-436

[9] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[10] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

← 1 2 3 4 5 6 →