Experimental Comparison of Features and Classifiers for Android Malware Detection

被引:6
作者
Shar, Lwin Khin [1 ]
Demissie, Biniam Fisseha [2 ]
Ceccato, Mariano [3 ]
Minn, Wei [1 ]
机构
[1] Singapore Management Univ, Singapore, Singapore
[2] Fdn Bruno Kessler, Povo, Italy
[3] Univ Verona, Verona, Italy
来源
2020 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS, MOBILESOFT | 2020年
基金
新加坡国家研究基金会;
关键词
Malware detection; machine learning; deep learning; Android;
D O I
10.1145/3387905.3388596
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Android platform has dominated the smart phone market for years now and, consequently, gained a lot of attention from attackers. Malicious apps (malware) pose a serious threat to the security and privacy of Android smart phone users. Available approaches to detect mobile malware based on machine learning rely on features extracted with static analysis or dynamic analysis techniques. Different types of machine learning classifiers (such as support vector machine and random forest) deep learning classifiers (based on deep neural networks) are then trained on extracted features, to produce models that can be used to detect mobile malware. The usually-analyzed features include permissions requested/used, frequency of API calls, use of API calls, and sequence of API calls. The API calls are analyzed at various granularity levels such as method, class, package, and family. In the view of the proposals of different types of classifiers and the use of different types of features and different underlying analyses used for feature extraction, there is a need for a comprehensive evaluation on the effectiveness of the current state-of-the-art studies in malware detection on a common benchmark. In this work, we provide a baseline comparison of several conventional machine learning classifiers and deep learning classifiers, without fine tuning. We also provide the evaluation of different types of features that characterize the use of API calls at class level and the sequence of API calls at method level. Features have been extracted from a common benchmark of 4572 benign samples and 2399 malware samples, using both static analysis and dynamic analysis. Among other interesting findings, we observed that classifiers trained on the use of API calls generally perform better than those trained on the sequence of API calls. Classifiers trained on static analysis-based features perform better than those trained on dynamic analysis-based features. Deep learning classifiers, despite their sophistication, are not necessarily better than conventional classifiers, especially when they are not optimized. However, deep learning classifiers do perform better than conventional classifiers when trained on dynamic analysis-based features.
引用
收藏
页码:50 / 60
页数:11
相关论文
共 49 条
  • [1] Aafer Y, 2013, L N INST COMP SCI SO, V127, P86
  • [2] Al Shebli H. M. Z., 2018, 2018 IEEE LONG ISL S, P1, DOI DOI 10.1109/LISAT.2018.8378035
  • [3] Allix K, 2016, 13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), P468, DOI [10.1145/2901739.2903508, 10.1109/MSR.2016.056]
  • [4] Android, 2019, UI/Application Exerciser Monkey
  • [5] Drebin: Effective and Explainable Detection of Android Malware in Your Pocket
    Arp, Daniel
    Spreitzenbarth, Michael
    Huebner, Malte
    Gascon, Hugo
    Rieck, Konrad
    [J]. 21ST ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2014), 2014,
  • [6] Arzt S, 2014, ACM SIGPLAN NOTICES, V49, P259, DOI [10.1145/2666356.2594299, 10.1145/2594291.2594299]
  • [7] Karbab EB, 2017, Arxiv, DOI arXiv:1712.08996
  • [8] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [9] Chan PPK, 2014, INT CONF MACH LEARN, P82, DOI 10.1109/ICMLC.2014.7009096
  • [10] Chen K, 2015, PROCEEDINGS OF THE 24TH USENIX SECURITY SYMPOSIUM, P659