A Method for Class-Imbalance Learning in Android Malware Detection

被引:3
作者
Guan, Jun [1 ]
Jiang, Xu [1 ]
Mao, Baolei [2 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China
[2] Zhengzhou Univ, Cooperat Innovat Ctr Internet Healthcare, Zhengzhou 450000, Peoples R China
关键词
random forest; SMOTE; android malware; imbalance data; clustering; under-sampling; PERFORMANCE; SMOTE;
D O I
10.3390/electronics10243124
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance in Android malware detection based on machine learning. Hence, the researchers have focused on how to solve class-imbalance to improve the performance of Android malware detection. However, the disadvantages of the existing class-imbalance learning are mainly the loss of valuable samples and the computational cost. In this paper, we propose a method of Class-Imbalance Learning (CIL), which first selects representative features, uses the clustering K-Means algorithm and under-sampling to retain the important samples of the majority class while reducing the number of samples of the majority class. After that, we use the Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to generate minority class samples for data balance, and finally use the Random Forest (RF) algorithm to build a malware detection model. The result of experiments indicates that CIL effectively improves the performance of Android malware detection based on machine learning, especially for class imbalance. Compared with existing class-imbalance learning methods, CIL is also effective for the Machine Learning Repository from the University of California, Irvine (UCI) and has better performance in some data sets.
引用
收藏
页数:14
相关论文
共 39 条
  • [1] A novel framework for prognostic factors identification of malignant mesothelioma through association rule mining
    Alam, Talha Mahboob
    Shaukat, Kamran
    Hameed, Ibrahim A.
    Khan, Wasim Ahmad
    Sarwar, Muhammad Umer
    Iqbal, Farhat
    Luo, Suhuai
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 68
  • [2] A Comprehensive Analysis of the Android Permissions System
    Almomani, Iman M.
    Al Khayer, Aala
    [J]. IEEE ACCESS, 2020, 8 : 216671 - 216688
  • [3] Android Malware Family Classification and Analysis: Current Status and Future Directions
    Alswaina, Fahad
    Elleithy, Khaled
    [J]. ELECTRONICS, 2020, 9 (06) : 1 - 20
  • [4] Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins
    Basu, Sankar
    Soderquist, Fredrik
    Wallner, Bjorn
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2017, 31 (05) : 453 - 466
  • [5] Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
  • [6] A systematic study of the class imbalance problem in convolutional neural networks
    Buda, Mateusz
    Maki, Atsuto
    Mazurowski, Maciej A.
    [J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
  • [7] Chawla NV, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P875, DOI 10.1007/978-0-387-09823-4_45
  • [8] Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance
    Chennuru, Venkata Krishnaveni
    Timmappareddy, Sobha Rani
    [J]. APPLIED INTELLIGENCE, 2022, 52 (02) : 2092 - 2110
  • [9] Ensembles of feature selectors for dealing with class-imbalanced datasets: A proposal and comparative study
    de Haro-Garcia, Aida
    Cerruela-Garcia, Gonzalo
    Garcia-Pedrajas, Nicolas
    [J]. INFORMATION SCIENCES, 2020, 540 (540) : 89 - 116
  • [10] Duan Y., 2018, 25th Annual Network and Distributed System Security Symposium, NDSS, P18