Android malware detection: An in-depth investigation of the impact of the use of imbalance datasets on the efficiency of machine learning models

被引:2
|
作者
Degrees, Zakaria Sawadogo [1 ,2 ,3 ]
Dembele, Jean-Marie [2 ,5 ]
Degrees, Gervais Mendy [1 ,4 ]
Ouya, Samuel [1 ]
机构
[1] Cheikh Anta Diop Univ, LITA, Lab Comp Sci Telecommun & Applicat, Dakar, Senegal
[2] Gaston Berger Univ, LANI, Lab Numer Anal & Comp Sci, Dakar, Senegal
[3] Gaston Berger Univ, Dakar, Senegal
[4] Univ Cheikh Anta Diop, ESP Polytechn Sch, Dakar, Senegal
[5] Gaston Berger Univ, Comp Sci, Dakar, Senegal
关键词
imbalanced dataset; Android malware detection; Malware classification; Artificial intelligence; Machine learning; PERFORMANCE;
D O I
10.23919/ICACT56868.2023.10079245
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine learning techniques have become an essential part of research into the detection and classification of malicious applications. There are several approaches or algorithms that learn from existing data and predict classes. Machine learning principles recommend a balance of classes in the training dataset, but the reality on the ground is quite different. The majority of datasets used for malicious application detection are unbalanced. Class imbalance degrades classifier performance, so it is a common problem in classification tasks. This observation is much more significant in the field of Android malware detection and classification. There is little work to our knowledge on the effects of unbalanced datasets in the field of Android malware detection. Our contribution focuses on the impact of unbalanced datasets on the performance of different algorithms and the relevance of using evaluation metrics in Android malware detection. And the state of the databases from which researchers typically draw datasets. We show that for malicious application detection, some classification algorithms are not suitable for unbalanced datasets. We also prove that some of the most widely used performance evaluation metrics in the literature (Accuracy, Precision, Recall) are not very well suited to unbalanced datasets. On the other hand, the metrics (Balanced Accuracy, Geometric mean) are more suitable. These results were obtained by evaluating the performances of eleven classification algorithms as well as the adequacy of the different evaluation metrics (Accuracy, Recall, Precision, F1_score, Balanced accuracy, Matthews corrcoef, Geometric mean, Fowlkes_mallows). Also not all databases are accessible by researchers and many of these databases are not updated.
引用
收藏
页码:1460 / 1467
页数:8
相关论文
共 50 条
  • [1] An in-depth review of machine learning based Android malware detection
    Muzaffar, Ali
    Hassen, Hani Ragab
    Lones, Michael A.
    Zantout, Hind
    COMPUTERS & SECURITY, 2022, 121
  • [2] Impact of datasets on machine learning based methods in Android malware detection: an empirical study
    Ge, Xiuting
    Huang, Yifan
    Hui, Zhanwei
    Wang, Xiaojuan
    Cao, Xu
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 81 - 92
  • [3] An Investigation on Fragility of Machine Learning Classifiers in Android Malware Detection
    Rafiq, Husnain
    Aslam, Nauman
    Issac, Biju
    Randhawa, Rizwan Hamid
    IEEE INFOCOM 2022 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2022,
  • [4] Android Malware Detection: An Empirical Investigation into Machine Learning Classifiers
    Raval, Aaditya
    Anwar, Mohd
    2024 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI 2024, 2024, : 144 - 149
  • [5] Use of Machine Learning Algorithms for Android App Malware Detection
    Rawat, Shaurya
    Phira, Rushang
    Natu, Prachi
    2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, : 448 - 454
  • [6] Malware Detection in Android Systems with Traditional Machine Learning Models: A Survey
    Bayazit, Esra Calik
    Sahingoz, Ozgur Koray
    Dogan, Buket
    2ND INTERNATIONAL CONGRESS ON HUMAN-COMPUTER INTERACTION, OPTIMIZATION AND ROBOTIC APPLICATIONS (HORA 2020), 2020, : 374 - 381
  • [7] Evaluating Machine Learning Models for Android Malware Detection - A Comparison Study
    Rana, Md. Shohel
    Gudla, Charan
    Sung, Andrew H.
    PROCEEDINGS OF 2018 VII INTERNATIONAL CONFERENCE ON NETWORK, COMMUNICATION AND COMPUTING (ICNCC 2018), 2018, : 17 - 21
  • [8] Machine learning models and dimensionality reduction for improving the Android malware detection
    Moran, Pablo
    Robles-Gomez, Antonio
    Duque, Andres
    Tobarra, Llanos
    Pastor-Vargas, Rafael
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [9] An Android Malware Detection Leveraging Machine Learning
    Shatnawi, Ahmed S.
    Jaradat, Aya
    Yaseen, Tuqa Bani
    Taqieddin, Eyad
    Al-Ayyoub, Mahmoud
    Mustafa, Dheya
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [10] Android Malware Detection Based on Machine Learning
    Wang, Qing-Fei
    Fang, Xiang
    2018 4TH ANNUAL INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC 2018), 2018, : 434 - 436