Defect Prediction in Android Binary Executables Using Deep Neural Network

被引:1
作者
Feng Dong
Junfeng Wang
Qi Li
Guoai Xu
Shaodong Zhang
机构
[1] Beijing University of Posts and Telecommunications,
[2] Sichuan University,undefined
来源
Wireless Personal Communications | 2018年 / 102卷
关键词
Software defect prediction; Mobile security; Android binary executables; Machine learning; Deep neural network;
D O I
暂无
中图分类号
学科分类号
摘要
Software defect prediction locates defective code to help developers improve the security of software. However, existing studies on software defect prediction are mostly limited to the source code. Defect prediction for Android binary executables (called apks) has never been explored in previous studies. In this paper, we propose an explorative study of defect prediction in Android apks. We first propose smali2vec, a new approach to generate features that capture the characteristics of smali (decompiled files of apks) files in apks. Smali2vec extracts both token and semantic features of the defective files in apks and such comprehensive features are needed for building accurate prediction models. Then we leverage deep neural network (DNN), which is one of the most common architecture of deep learning networks, to train and build the defect prediction model in order to achieve accuracy. We apply our defect prediction model to more than 90,000 smali files from 50 Android apks and the results show that our model could achieve an AUC (the area under the receiver operating characteristic curve) of 85.98% and it is capable of predicting defects in apks. Furthermore, the DNN is proved to have a better performance than the traditional shallow machine learning algorithms (e.g., support vector machine and naive bayes) used in previous studies. The model has been used in our practical work and helped locate many defective files in apks.
引用
收藏
页码:2261 / 2285
页数:24
相关论文
共 74 条
  • [1] Bengio Y(2009)Learning deep architectures for ai Foundations & Trends in Machine Learning 2 1-127
  • [2] Bishnu PS(2012)Software fault prediction using quad tree-based k-means clustering algorithm IEEE Transactions on Knowledge and Data Engineering 24 1146-1150
  • [3] Bhattacherjee V(2014)Deep learning: methods and applications Foundations and Trends® in Signal Processing 7 197-387
  • [4] Deng L(2015)A static android malicious code detection method based on multisource fusion Security and Communication Networks 8 3238-3246
  • [5] Yu D(2012)A systematic literature review on fault prediction performance in software engineering IEEE Transactions on Software Engineering 38 1276-1304
  • [6] Du Y(2012)Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups IEEE Signal Processing Magazine 29 82-97
  • [7] Wang X(2015)Deep learning Nature 521 436-444
  • [8] Wang J(2008)Benchmarking classification models for software defect prediction: A proposed framework and novel findings IEEE Transactions on Software Engineering 34 485-496
  • [9] Hall T(2014)Bayesian estimation of Dirichlet mixture model with variational inference Pattern Recognition 47 3143-3157
  • [10] Beecham S(2016)Feature selection for neutral vector in eeg signal classification Neurocomputing 174 937-945