An Effective Malware Detection Method Using Hybrid Feature Selection and Machine Learning Algorithms

被引:0
作者
Namita Dabas
Prachi Ahlawat
Prabha Sharma
机构
[1] The NorthCap University,School of Engineering and Technology
来源
Arabian Journal for Science and Engineering | 2023年 / 48卷
关键词
Malware detection; API calls; API sequences; Frequent patterns; Feature selection; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
With the advent of internet-based technology, there has been a surge in internet-enabled devices. These devices generate massive volumes of meaningful information to accomplish several tasks. Conversely, cyber-criminals leverage this information to perform cyber-attacks. Malware is one of the most prevalent attacks in the cyber threat landscape to fulfil malicious intents of cyber-criminals. Thus, it becomes imperative to detect and prevent these malware attacks precisely to minimize the damage. A number of researchers have proved that API calls can comprehend malware behaviour accurately and can be utilized with machine learning algorithms to effectively detect malware. Therefore, this paper proposes a novel malware detection method for Windows platform based on API calls, feature selection, and machine learning algorithms. It extracts API calls information in three forms: API calls usage, API calls frequency, and API calls sequences to create three feature sets. These feature sets are enriched using TF-IDF technique and combined to create a more extensive and robust feature set, API integrated feature set. A series of experiments were conducted and results showed that API integrated feature set outperformed other feature sets by attaining 99.6% and higher accuracy for all machine learning algorithms. To address the high-dimensionality concern of API integrated feature set, this work applied several feature selection techniques and results showed that we are able to achieve 99.6–99.9% accuracy with only 9% features of API integrated feature set using hybrid feature selection and machine learning algorithms.
引用
收藏
页码:9749 / 9767
页数:18
相关论文
共 199 条
  • [31] Ye D(2004)Mining sequential patterns by pattern-growth: the prefixspan approach IEEE Trans. Knowl. Data Eng. 16 1424-205
  • [32] Jiang Q(2017)Recent advances in feature selection and its applications Knowl. Inf. Syst. 53 551-152
  • [33] Amer E(2018)Bio-inspired for features optimization and malware detection Arab. J. Sci. Eng. 43 6963-4641
  • [34] Zelinka I(2003)An introduction to variable and feature selection J. Mach. Learn. Res. 3 1157-1882
  • [35] Han W(2020)A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction J. Appl. Sci. Technol. Trends 1 56-509
  • [36] Xue J(2005)Minimum redundancy feature selection from microarray gene expression data J. Bioinform. Comput. Biol. 3 185-324
  • [37] Wang Y(2017)Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data Int. J. Intell. Syst. 32 134-1904
  • [38] Huang L(2011)mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification Inf. Sci. 181 4625-297
  • [39] Kong Z(2014)Exploring permission-induced risk in android applications for malicious application detection IEEE Trans. Inf. Forensics Secur. 9 1869-524
  • [40] Mao L(2013)SVM training phase reduction using dataset feature filtering for malware detection IEEE Trans. Inf. Forensics Secur. 8 500-1206