Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis

被引:158
作者
Fan, Ming [1 ,2 ]
Liu, Jun [3 ]
Luo, Xiapu [2 ]
Chen, Kai [4 ,5 ]
Tian, Zhenzhou [6 ]
Zheng, Qinghua [1 ]
Liu, Ting [1 ]
机构
[1] Xi An Jiao Tong Univ, MOEKLINNS Lab, Dept Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Hong Kong, Peoples R China
[3] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Natl Engn Lab Big Data Analyt, Xian 710049, Shaanxi, Peoples R China
[4] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing 100195, Peoples R China
[5] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100195, Peoples R China
[6] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
Android malware; frequent subgraph; familial classification; PLAGIARISM DETECTION;
D O I
10.1109/TIFS.2018.2806891
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rapid increase in the number of Android malware poses great challenges to anti-malware systems, because the sheer number of malware samples overwhelms malware analysis systems. The classification of malware samples into families, such that the common features shared by malware samples in the same family can be exploited in malware detection and inspection, is a promising approach for accelerating malware analysis. Furthermore, the selection of representative malware samples in each family can drastically decrease the number of malware to be analyzed. However, the existing classification solutions are limited because of the following reasons. First, the legitimate part of the malware may misguide the classification algorithms because the majority of Android malware are constructed by inserting malicious components into popular apps. Second, the polymorphic variants of Android malware can evade detection by employing transformation attacks. In this paper, we propose a novel approach that constructs frequent subgraphs (fregraphs) to represent the common behaviors of malware samples that belong to the same family. Moreover, we propose and develop FalDroid, a novel system that automatically classifies Android malware and selects representative malware samples in accordance with fregraphs. We apply it to 8407 malware samples from 36 families. Experimental results show that FalDroid can correctly classify 94.2% of malware samples into their families using approximately 4.6 sec per app. FalDroid can also dramatically reduce the cost of malware investigation by selecting only 8.5% to 22% representative samples that exhibit the most common malicious behavior among all samples.
引用
收藏
页码:1890 / 1905
页数:16
相关论文
共 55 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] [Anonymous], 2012, P 18 ACM SIGKDD INT
  • [3] [Anonymous], 2016, APKTOOL TOOL REVERSE
  • [4] Drebin: Effective and Explainable Detection of Android Malware in Your Pocket
    Arp, Daniel
    Spreitzenbarth, Michael
    Huebner, Malte
    Gascon, Hugo
    Rieck, Konrad
    [J]. 21ST ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2014), 2014,
  • [5] Arzt S, 2014, ACM SIGPLAN NOTICES, V49, P259, DOI [10.1145/2594291.2594299, 10.1145/2666356.2594299]
  • [6] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [9] Chen K., 2015, PROC USENIX SEC, P1
  • [10] Achieving Accuracy and Scalability Simultaneously in Detecting Application Clones on Android Markets
    Chen, Kai
    Liu, Peng
    Zhang, Yingjun
    [J]. 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 175 - 186