Hybrid sequence-based Android malware detection using natural language processing

被引:46
作者
Zhang, Nan [1 ]
Xue, Jingfeng [1 ]
Ma, Yuxi [1 ]
Zhang, Ruyun [2 ]
Liang, Tiancai [3 ]
Tan, Yu-an [4 ]
机构
[1] Beijing Inst Technol, Sch Comp, Beijing, Peoples R China
[2] Zhejiang Lab, Hangzhou, Zhejiang, Peoples R China
[3] GRG Banking Equipment Co Ltd, Guangzhou 510145, Peoples R China
[4] Beijing Inst Technol, Sch Cyberspace Sci & Technol, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Android malware detection; attention; deep learning; hybrid analysis; machine learning; natural language processing; text classification;
D O I
10.1002/int.22529
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Android platform has been the target of attackers due to its openness and increasing popularity. Android malware has explosively increased in recent years, which poses serious threats to Android security. Thus proposing efficient Android malware detection methods is curial in defeating malware. Various features extracted from static or dynamic analysis using machine learning have played an important role in malware detection recently. However, existing code obfuscation, code encryption, and dynamic code loading techniques can be employed to hinder systems that single based on static analysis, purely dynamic analysis systems cannot detect all potential code execution paths. To address these issues, we propose CoDroid, a sequence-based hybrid Android malware detection method, which utilizes the sequences of static opcode and dynamic system call. We treat one sequence as a sentence in the natural language processing and construct a CNN-BiLSTM-Attention classifier which consists of Convolutional Neural Networks (CNNs), the Bidirectional Long Short-Term Memory (BiLSTM) with an attention language model. We extensively evaluate CoDroid under a real-world data set and perform comprehensive analysis against other existing related detection methods. The evaluations show the effectiveness and flexibility of CoDroid across a variety of experimental settings.
引用
收藏
页码:5770 / 5784
页数:15
相关论文
共 49 条
[11]  
Canfora G., 2015, P 3 INT WORKSH SOFTW, P13, DOI [10.1145/2804345.2804349, DOI 10.1145/2804345.2804349]
[12]   Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware [J].
Canfora, Gerardo ;
De Lorenzo, Andrea ;
Medvet, Eric ;
Mercaldo, Francesco ;
Visaggio, Corrado Aaron .
PROCEEDINGS 10TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY ARES 2015, 2015, :333-340
[13]   Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach [J].
Chen, Sen ;
Xue, Minhui ;
Fan, Lingling ;
Hao, Shuang ;
Xu, Lihua ;
Zhu, Haojin ;
Li, Bo .
COMPUTERS & SECURITY, 2018, 73 :326-344
[14]   StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware [J].
Chen, Sen ;
Xue, Minhui ;
Tang, Zhushou ;
Xu, Lihua ;
Zhu, Haojin .
ASIA CCS'16: PROCEEDINGS OF THE 11TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :377-388
[15]   Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection [J].
Chen, Xiao ;
Li, Chaoran ;
Wang, Derui ;
Wen, Sheng ;
Zhang, Jun ;
Nepal, Surya ;
Xiang, Yang ;
Ren, Kui .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2020, 15 :987-1001
[16]  
Das PK, 2017, IEEE CONF COMPUT, P487, DOI 10.1109/INFCOMW.2017.8116425
[17]  
Development of Android malware worldwide 2016-2020, STAT
[18]   TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones [J].
Enck, William ;
Gilbert, Peter ;
Han, Seungyeop ;
Tendulkar, Vasant ;
Chun, Byung-Gon ;
Cox, Landon P. ;
Jung, Jaeyeon ;
McDaniel, Patrick ;
Sheth, Anmol N. .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2014, 32 (02)
[19]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[20]  
Jerome Q, 2014, IEEE ICC, P914, DOI 10.1109/ICC.2014.6883436