A pragmatic android malware detection procedure

被引:25
作者
Palumbo, Paolo [1 ]
Sayfullina, Luiza [2 ]
Komashinskiy, Dmitriy [1 ]
Eirola, Emil [3 ]
Karhunen, Juha [2 ]
机构
[1] F Secure Corp, Helsinki, Finland
[2] Aalto Univ, Dept Informat & Comp Sci, Espoo, Finland
[3] Arcada Univ Appl Sci, Helsinki, Finland
关键词
Android; Malware detection; Static analysis; Machine learning; Classification; Ensemble learning; Feature selection;
D O I
10.1016/j.cose.2017.07.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The academic security research community has studied the Android malware detection problem extensively. Machine learning methods proposed in previous work typically achieve high reported detection performance on fixed datasets. Some of them also report reasonably fast prediction times. However, most of them are not suitable for real-world deployment because requirements for malware detection go beyond these figures of merit. In this paper, we introduce several important requirements for deploying Android malware detection systems in the real world. One such requirement is that candidate approaches should be tested against a stream of continuously evolving data. Such streams of evolving data represent the continuous flow of unknown file objects received for categorization, and provide more reliable and realistic estimate of detection performance once deployed in a production environment. As a case study we designed and implemented an ensemble approach for automatic Android malware detection that meets the real-world requirements we identified. Atomic Naive Bayes classifiers used as inputs for the Support Vector Machine ensemble are based on different APK feature categories, providing fast speed and additional reliability against the attackers due to diversification. Our case study with several malware families showed that different families are detected by different atomic classifiers. To the best of our knowledge, our work contains the first publicly available results generated against evolving data streams of nearly 1 million samples with a model trained over a massive sample set of 120,000 samples. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:689 / 701
页数:13
相关论文
共 32 条
[1]  
Aafer Y, 2013, L N INST COMP SCI SO, V127, P86
[2]  
Allix K., 2014, Machine Learning-Based Malware Detection for Android Applications: History Matters!
[3]  
[Anonymous], 2015, SMAL ASS DIS ANDR DE
[4]  
[Anonymous], 2011, Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]  
Chen K, 2015, PROCEEDINGS OF THE 24TH USENIX SECURITY SYMPOSIUM, P659
[7]  
Cimpanu C, 2016, BLEEPING COMPUTER AD
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]  
Desnos A., 2015, Androguard
[10]  
F-Secure Corp, 2015, PROT YOUR LIF EV DEV