Improving Machine Learning Models for Malware Detection Using Embedded Feature Selection Method

被引:16
作者
Chemmakha, Mohammed [1 ]
Habibi, Omar [1 ]
Lazaar, Mohamed [1 ]
机构
[1] Mohammed V Univ Rabat, ENSIAS, Rabat, Morocco
关键词
Feature Selection; Machine Learning; Malware Detection; LightGBM; Random Forest; Support vector machine (SVM); ANN; XGBoost;
D O I
10.1016/j.ifacol.2022.07.406
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning performance always rely on relevant phase of pre-processing, that includes dataset cleaning, cleansing and extraction. Feature selection (FS) is a crucial phase too, because it is intended to increase the efficiency of Machine Learning (ML) models in terms of predictiveness, by assigning a representative value to the most important features in a dataset of malware. In this study, we focus on feature selection using embedded-based methods in order to minimize computational time and complexity of ML models. Embedded-based methods combine advantages of both filter-based and wrapped-based methods, in terms of studying the importance of features while executing the model and their reduced time of execution. Applying ML models shows a high stability of models will selecting 10 most relevant features from the dataset, with an accuracy that achieve 99.47%, 99.02% for respectively Random Forest (RF) and XGBoost (XGB). Copyright (C) 2022 The Authors.
引用
收藏
页码:771 / 776
页数:6
相关论文
共 16 条
[1]  
Ahamed B. S., 2021, Clinical Medicine, V8, P14
[2]  
Aijaz UN, 2018, Perspectives in Communication, Embedded-systems and Signal-processing - PiCES, V2, P172
[3]  
Alelyani Salem, 2013, FEATURE REV
[4]   A Study on the Effect of Feature Selection on Malware Analysis using Machine Learning [J].
Babaagba, Kehinde Oluwatoyin ;
Adesanya, Samuel Olumide .
PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON EDUCATIONAL AND INFORMATION TECHNOLOGY (ICEIT 2019), 2019, :51-55
[5]   Integration of Principal Component Analysis and Recurrent Neural Network to Forecast the Stock Price of Casablanca Stock Exchange [J].
Berradi, Zahra ;
Lazaar, Mohamed .
SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2018), 2019, 148 :55-61
[6]   An improved ensemble based intrusion detection technique usingXGBoost [J].
Bhati, Bhoopesh Singh ;
Chugh, Garvit ;
Al-Turjman, Fadi ;
Bhati, Nitesh Singh .
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2021, 32 (06)
[7]   An evaluation of feature selection methods for environmental data [J].
Effrosynidis, Dimitrios ;
Arampatzis, Avi .
ECOLOGICAL INFORMATICS, 2021, 61
[8]   Comparison of Feature Selection Methods for Sentiment Analysis [J].
El Mrabti, Soufiane ;
Al Achhab, Mohammed ;
Lazaar, Mohamed .
BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 :261-272
[9]  
Jovic A, 2015, 2015 8TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), P1200, DOI 10.1109/MIPRO.2015.7160458
[10]  
kaggle, About Us