Android malware detection: An in-depth investigation of the impact of the use of imbalance datasets on the efficiency of machine learning models

被引:2
|
作者
Degrees, Zakaria Sawadogo [1 ,2 ,3 ]
Dembele, Jean-Marie [2 ,5 ]
Degrees, Gervais Mendy [1 ,4 ]
Ouya, Samuel [1 ]
机构
[1] Cheikh Anta Diop Univ, LITA, Lab Comp Sci Telecommun & Applicat, Dakar, Senegal
[2] Gaston Berger Univ, LANI, Lab Numer Anal & Comp Sci, Dakar, Senegal
[3] Gaston Berger Univ, Dakar, Senegal
[4] Univ Cheikh Anta Diop, ESP Polytechn Sch, Dakar, Senegal
[5] Gaston Berger Univ, Comp Sci, Dakar, Senegal
来源
2023 25TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, ICACT | 2023年
关键词
imbalanced dataset; Android malware detection; Malware classification; Artificial intelligence; Machine learning; PERFORMANCE;
D O I
10.23919/ICACT56868.2023.10079245
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine learning techniques have become an essential part of research into the detection and classification of malicious applications. There are several approaches or algorithms that learn from existing data and predict classes. Machine learning principles recommend a balance of classes in the training dataset, but the reality on the ground is quite different. The majority of datasets used for malicious application detection are unbalanced. Class imbalance degrades classifier performance, so it is a common problem in classification tasks. This observation is much more significant in the field of Android malware detection and classification. There is little work to our knowledge on the effects of unbalanced datasets in the field of Android malware detection. Our contribution focuses on the impact of unbalanced datasets on the performance of different algorithms and the relevance of using evaluation metrics in Android malware detection. And the state of the databases from which researchers typically draw datasets. We show that for malicious application detection, some classification algorithms are not suitable for unbalanced datasets. We also prove that some of the most widely used performance evaluation metrics in the literature (Accuracy, Precision, Recall) are not very well suited to unbalanced datasets. On the other hand, the metrics (Balanced Accuracy, Geometric mean) are more suitable. These results were obtained by evaluating the performances of eleven classification algorithms as well as the adequacy of the different evaluation metrics (Accuracy, Recall, Precision, F1_score, Balanced accuracy, Matthews corrcoef, Geometric mean, Fowlkes_mallows). Also not all databases are accessible by researchers and many of these databases are not updated.
引用
收藏
页码:1460 / 1467
页数:8
相关论文
共 50 条
  • [31] A Machine Learning Approach for Real Time Android Malware Detection
    Ngoc C Le
    Tien-Manh Nguyen
    Trang Truong
    Ngoc-Dam Nguyen
    Tra Ngo
    2020 RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES (RIVF 2020), 2020, : 347 - 352
  • [32] A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection
    Kouliaridis, Vasileios
    Kambourakis, Georgios
    INFORMATION, 2021, 12 (05)
  • [33] A Review of Android Malware Detection Approaches Based on Machine Learning
    Liu, Kaijun
    Xu, Shengwei
    Xu, Guoai
    Zhang, Miao
    Sun, Dawei
    Liu, Haifeng
    IEEE ACCESS, 2020, 8 (08): : 124579 - 124607
  • [34] Android Malware Detection Using Parallel Machine Learning Classifiers
    Yerima, Suleiman Y.
    Sezer, Sakir
    Muttik, Igor
    2014 EIGHTH INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPS, SERVICES AND TECHNOLOGIES (NGMAST), 2014, : 37 - 42
  • [35] Android Malware Detection Using Machine Learning on Image Patterns
    Darus, Falai Mohd
    Salleh, Noor Azurati Alimad
    Ariffin, Aswami Fadillah Mohd
    PROCEEDINGS OF THE 2018 CYBER RESILIENCE CONFERENCE (CRC), 2018,
  • [36] AndyWar: an intelligent android malware detection using machine learning
    Roy, Sandipan
    Bhanja, Samit
    Das, Abhishek
    Innovations in Systems and Software Engineering, 2023,
  • [37] Android Malware Detection Using API Calls: A Comparison of Feature Selection and Machine Learning Models
    Muzaffar, Ali
    Hassen, Hani Ragab
    Lones, Michael A.
    Zantout, Hind
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED CYBER SECURITY (ACS) 2021, 2022, 378 : 3 - 12
  • [38] Analysis of machine learning models for malware detection
    Rahul
    Kedia, Priyansh
    Sarangi, Subrat
    Monika
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2020, 23 (02): : 395 - 407
  • [39] Machine learning based hybrid behavior models for Android malware analysis
    Chuang, Hsin-Yu
    Wang, Sheng-De
    2015 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND RELIABILITY (QRS 2015), 2015, : 201 - 206
  • [40] Application of Machine Learning Models for Malware Classification With Real and Synthetic Datasets
    Joshi, Santosh
    Pons, Alexander Perez
    Kulkarni, Shrirang Ambaji
    Upadhyay, Himanshu
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2024, 18 (01)